Optimized messenger rna

ABSTRACT

The present invention is directed to a synthetic nucleic acid sequence which encodes a protein wherein at least one non-common codon or less-common codon is replaced by a common codon. The synthetic nucleic acid sequence can include a continuous stretch of at least 90 codons all of which are common codons.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. Ser. No. 09/686,497, filedOct. 11, 2000, which is a continuation in part of U.S. Ser. No.09/407,605 (now U.S. Pat. No. 6,924,365), filed Sep. 28, 1999, whichclaims the benefit of prior U.S. provisional application 60/102,239,filed Sep. 29, 1998, and prior U.S. provisional application 60/130, 241,filed Apr. 20, 1999, the contents of which are herein incorporated byreference.

FIELD OF THE INVENTION

The invention is directed to methods for optimizing the properties ofmRNA molecules, optimized mRNA molecules, methods of using optimizedmRNA molecules, and compositions which include optimized mRNA molecules.

BACKGROUND OF THE INVENTION

In eukarocytes, gene expression is affected, in part, by the stabilityand structure of the messenger RNA (mRNA) molecule. mRNA stabilityinfluences gene expression by affecting the steady-state level of themRNA. It can affect the rates at which the mRNA disappears followingtranscriptional repression and accumulates following transcriptionalinduction. The structure and nucleotide sequence of the mRNA moleculecan also influence the efficiency with which these individual mRNAmolecules are translated.

The intrinsic stability of a given mRNA molecule is influenced by anumber of specific internal sequence elements which can exert adestabilizing effect on the mRNA. These elements may be located in anyregion of the transcript, and e.g., can be found in the 5′ untranslatedregion (5′UTR), in the coding region and in the 3′ untranslated region(3′UTR). It is well established that shortening of the poly(A) tailinitiates mRNA decay (Ross, Trends in Genetics, 12:171-175, 1996). Thepoly(A) tract influences cytoplasmic mRNA stability by protecting mRNAfrom rapid degradation. Adenosine and uridine rich elements (AUREs) inthe 3′UTR are also associated with unstable mammalian mRNA's. It hasbeen demonstrated that proteins that bind to AURE, AURE-binding proteins(AUBPs) can affect mRNA stability. The coding region can also alter thehalf-life of many RNAs. For example, the coding region can interact withproteins that protect it from endonucleolytic attack. Furthermore, theefficiency with which individual mRNA molecules are translated has astrong influence on the stability of the mRNA molecule (Herrick et al.,Mol Cell Biol. 10, 2269-2284, 1990, and Hoekema et al., Mol Cell Biol.7, 2914-2924, 1987).

The single-stranded nature of mRNA allows it to adopt secondary andtertiary structure in a sequence-dependent manner through complementarybase pairing. Examples of such structures include RNA hairpins, stemloops and more complex structures such as bifurcations, pseudoknots andtriple-helices. These structures influence both mRNA stability, e.g.,the stem loop elements in the 3′ UTR can serve as an endonucleasecleavage site, and affect translational efficiency.

In addition to the structure of the mRNA, the nucleotide content of themRNA can also play a role in the efficiency with which the mRNA istranslated. For example, mRNA with a high GC content at the 5′untranslated region (UTR) may be translated with low efficiency and areduced translational effect can reduce message stability. Thus,altering the sequence of a mRNA molecule can ultimately influence mRNAtranscript stability, by influencing the translational stability of themessage.

Factor VIII and Factor IX are important plasma proteins that participatein the intrinsic pathway of blood coagulation. Their dysfunction orabsence in individuals can result in blood coagulation disorders, e.g.,a deficiency of Factor VIII or Factor IX results in Hemophilia A or B,respectively. Isolating Factor VIII or Factor IX from blood isdifficult, e.g., the isolation of Factor VIII is characterized by lowyields, and also has the associated danger of being contaminated withinfectious agents such as Hepatitis B virus, Hepatitis C virus or HIV.Recombinant DNA technology provides an alternative method for producingbiologically active Factor VIII or Factor IX. While these methods havehad some success, improving the yield of Factor VIII or Factor IX isstill a challenge.

An approach to increasing protein yield using recombinant DNA technologyis to modify the coding sequence of a protein of interest, e.g., FactorVIII or Factor IX, without altering the amino acid sequence of the geneproduct. This approach involves altering, for example, the native FactorVIII or Factor IX gene sequence such that codons which are not sofrequently used in mammalian cells are replaced with codons which areoverrepresented in highly expressed mammalian genes. Seed et al., (WO98/12207) used this approach with a measure of success. They found thatsubstituting the rare mammalian codons with those frequently used inmammalian cells results in a four fold increase in Factor VIIIproduction from mammalian cells.

SUMMARY OF THE INVENTION

In one aspect, the invention features, a synthetic nucleic acid sequencewhich encodes a protein, or a portion thereof, wherein at least onenon-common codon or less-common codon has been replaced by a commoncodon, and wherein the synthetic nucleic acid sequence includes acontinuous stretch of at least 90 codons all of which are common codons.

The synthetic nucleic acid can direct the synthesis of an optimizedmessenger mRNA. In a preferred embodiment, the continuous stretch ofcommon codons can include: the sequence of a pre-pro-protein; thesequence of a pro-protein; the sequence of a mature protein; the “pre”sequence of a pre-pro-protein; the “pre-pro” sequence of apre-pro-protein; the “pro” sequence of a pre-pro or a pro-protein; or aportion of any of the aforementioned sequences.

In a preferred embodiment, the synthetic nucleic acid sequence includesa continuous stretch of at least 90, 95, 100, 125, 150, 200, 250, 300 ormore codons all of which are common codons.

In another preferred embodiment, the nucleic acid sequence encoding aprotein has at least 30, 50, 60, 75, 100, 200 or more non-common orless-common codons replaced with a common codon.

In a preferred embodiment, the number of non-common or less-commoncodons replaced is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In a preferred embodiment, the number of non-common or less-commoncodons remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In preferred embodiments, the non-common and less-common codonsreplaced, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In preferred embodiments, the non-common and less-common codonsremaining, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In a preferred embodiment, all of the non-common or less-common codonsof the synthetic nucleic acid sequence encoding a protein have beenreplaced with common codons.

In a preferred embodiment, the synthetic nucleic acid sequence encodes aprotein of at least about 90, 95, 100, 105, 110, 120, 130, 150, 200,500, 700, 1000 or more amino acids in length.

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%,or all, of the codons in the synthetic nucleic acid sequence are commoncodons. Preferably, all of the codons in the synthetic nucleic acidsequence are common codons.

In preferred embodiments, the protein is expressed in a eukaryotic cell,e.g., a mammalian cell, e.g., a human cell, and the protein is amammalian protein, e.g., a human protein.

In another aspect, the invention features, a synthetic nucleic acidsequence which encodes a protein, or a portion thereof, wherein at leastone non-common codon or less-common codon has been replaced by a commoncodon, and wherein the synthetic nucleic acid sequence includes acontinuous stretch of common codons, which continuous stretch includesat least 33% or more of the codons in the synthetic nucleic acidsequence.

The synthetic nucleic acid can direct the synthesis of an optimizedmessenger mRNA. In a preferred embodiment, the continuous stretch ofcommon codons can include: the sequence of a pre-pro-protein; thesequence of a pro-protein; the sequence of a mature protein; the “pre”sequence of a pre-pro-protein; the “pre-pro” sequence of apre-pro-protein; the “pro” sequence of a pre-pro or a pro-protein; or aportion of any of the aforementioned sequences.

In a preferred embodiment, the synthetic nucleic acid sequence includesa continuous stretch of common codons wherein the continuous stretchincludes at least 35%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 100% ofcodons in the synthetic nucleic acid sequence.

In a preferred embodiment, the number of non-common or less-commoncodons replaced is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In a preferred embodiment, the number of non-common or less-commoncodons remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In preferred embodiments, the non-common and less-common codonsreplaced, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In preferred embodiments, the non-common and less-common codonsremaining, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In a preferred embodiment, all of the non-common or less-common codonsof the synthetic nucleic acid sequence encoding a protein have beenreplaced with common codons.

In a preferred embodiment, all non-common and less-common codons arereplaced with common codons.

In a preferred embodiment, the synthetic nucleic acid sequence encodes aprotein of at least about 90, 95, 100, 105, 110, 120, 130, 150, 200,500, 700, 1000 or more amino acids in length.

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%,or all, of the codons in the synthetic nucleic acid sequence are commoncodons. Preferably, all of the codons in the synthetic nucleic acidsequence are common codons.

In preferred embodiments, the protein is expressed in a eukaryotic cell,e.g., a mammalian cell, e.g., a human cell, and the protein is amammalian protein, e.g., a human protein.

In another aspect, the invention features, a synthetic nucleic acidsequence which encodes a protein, or a portion thereof, wherein at leastone non-common codon or less-common codon has been replaced by a commoncodon, and wherein the number of non-common and less-common codons,taken together, is less than n/x, wherein n/x is a positive integer, nis the number of codons in the synthetic nucleic acid sequence and x ischosen from 2, 4, 6, 10, 15, 20, 50, 150, 250, 500 and 1000. (Fractionalvalues for n/x are rounded to the next highest of lowest integer,positive values below 0.5 are rounded down and values above 0.5 arerounded up).

The synthetic nucleic acid can direct the synthesis of an optimizedmessenger mRNA. In a preferred embodiment, the continuous stretch ofcommon codons can include: the sequence of a pre-pro-protein; thesequence of a pro-protein; the sequence of a mature protein; the “pre”sequence of a pre-pro-protein; the “pre-pro” sequence of apre-pro-protein; the “pro” sequence of a pre-pro or a pro-protein; or aportion of any of the aforementioned sequences.

In a preferred embodiment, the number of codons in the synthetic nucleicacid sequence (n) is at least 50, 60, 70, 80, 90, 100, 120, 150, 200,350, 400, 500 or more.

In a preferred embodiment, the number of non-common or less-commoncodons replaced is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In a preferred embodiment, the number of non-common or less-commoncodons remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In preferred embodiments, the non-common and less-common codonsreplaced, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In preferred embodiments, the non-common and less-common codonsremaining, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In a preferred embodiment, all non-common or less-common codons arereplaced with common codons.

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%,or all of the codons in the synthetic nucleic acid sequence are commoncodons. Preferably, all of the codons in the synthetic nucleic acidsequence are common codons.

In preferred embodiments, the protein is expressed in a eukaryotic cell,e.g., a mammalian cell, e.g., a human cell, and the protein is amammalian protein, e.g., a human protein.

In another aspect, the invention features, a synthetic nucleic acidsequence which encodes a protein, or a portion thereof, wherein at leastone non-common codon or less-common codon has been replaced by a commoncodon in the sequence that has not been optimized (non-optimized) whichencodes the protein, wherein at least 94% or more of the codons in thesequence encoding the protein are common codons and wherein thesynthetic nucleic acid sequence encodes a protein of at least about 90,100 or 120 amino acids in length.

The synthetic nucleic acid can direct the synthesis of an optimizedmessenger mRNA. In a preferred embodiment, the continuous stretch ofcommon codons can include: the sequence of a pre-pro-protein; thesequence of a pro-protein; the sequence of a mature protein; the “pre”sequence of a pre-pro-protein; the “pre-pro” sequence of apre-pro-protein; the “pro” sequence of a pre-pro or a pro-protein; or aportion of any of the aforementioned sequences.

In preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, 99.5%or more of non-common or less-common codons in the non-optimized nucleicacid sequence encoding the protein have been replaced by a common codonencoding the same amino acid. Preferably, all non-common or allless-common codon are replaced by a common codon encoding the same aminoacid as found in the non-optimized sequence.

In a preferred embodiment, the synthetic nucleic acid sequence encodes aprotein of at least about 90, 95, 100, 105, 110, 120, 130, 150, 200,500, 700, 1000 or more amino acids in length.

In other preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 98.5%,99%, 99.5% of the non-common codons in the non-optimized nucleic acidsequence are replaced with common codons. Preferably, all of thenon-common codons are replaced with the common codons.

In other preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 98%,99%, 99.5% of the less-common codons in the non-optimized nucleic acidsequence are replaced with common codons. Preferably, all of theless-common codons are replaced with the common codons.

In preferred embodiments, at least 94% or more of the non-common andless common codons are replaced with common codons.

In preferred embodiments, the number of codons replaced which are notcommon codons is equal to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2,or 1.

In preferred embodiments, the number of codons remaining which are notcommon codons is equal to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2,or 1

In preferred embodiments, the protein is expressed in a eukaryotic cell,e.g., a mammalian cell, e.g., a human cell, and the protein is amammalian protein, e.g., a human protein.

The synthetic nucleic acid can direct the synthesis of an optimizedmessenger mRNA. In a preferred embodiment, the continuous stretch ofcommon codons can include: the sequence of a pre-pro-protein; thesequence of a pro-protein; the sequence of a mature protein; the “pre”sequence of a pre-pro-protein; the “pre-pro” sequence of apre-pro-protein; the “pro” sequence of a pre-pro or a pro-protein; or aportion of any of the aforementioned sequences.

In a preferred embodiment the synthetic nucleic acid sequence is atleast 100, 110, 120, 150, 200, 300, 500, 700, 1000 or more base pairs inlength.

In another aspect, the invention features a synthetic nucleic acidsequence that directs the synthesis of an optimized message whichencodes a Factor VIII protein having one or more of the followingcharacteristics:

a) the B domain is deleted (BDD Factor VIII);

b) the synthetic nucleic acid sequence has a recognition site for anintracellular protease of the PACE/furin class, e.g., X-Arg-X-X-Arg(Molloy et al., J. Biol. Chem. 267:1639616401, 1992); a short-peptidelinker, e.g., a two peptide linker, e.g., a leucine-glutamic acidpeptide linker (LE), a three, or a four peptide linker, inserted at theheavy-light chain junction.

c) the synthetic nucleic acid sequence is introduced into a cell, e.g.,a primary cell, a secondary cell, a transformed or an immortalized cellline. Examples of an immortalized human cell line useful in the presentmethod include, but are not limited to; a Bowes Melanoma cell (ATCCAccession No. CRL 9607), a Daudi cell (ATCC Accession No. CCL 213), aHeLa cell and a derivative of a HeLa cell (ATCC Accession Nos. CCL 2,CCL2.1, and CCL 2.2), a HL-60 cell (ATCC Accession No. CCL 240), aHT-1080 cell (ATCC Accession No. CCL 121), a Jurkat cell (ATCC AccessionNo. TIB 152), a KB carcinoma cell (ATCC Accession No. CCL 17), a K-562leukemia cell (ATCC Accession No. CCL 243), a MCF-7 breast cancer cell(ATCC Accession No. BTH 22), a MOLT-4 cell (ATCC Accession No. 1582), aNamalwa cell (ATCC Accession No. CRL 1432), a Raji cell (ATCC AccessionNo. CCL 86), a RPMI 8226 cell (ATCC Accession No. CCL 155), a U-937 cell(ATCC Accession No. CRL 1593), WI-38VA13 sub line 2R4 cells (ATCCAccession No. CLL 75.1), a CCRF-CEM cell (ATCC Accession No. CCL 119)and a 2780AD ovarian carcinoma cell (Van Der Blick et al., Cancer Res.48: 5927-5932, 1988), as well as heterohybridoma cells produced byfusion of human cells and cells of another species. In anotherembodiment, the immortalized cell line can be cell line other than ahuman cell line, e.g., a CHO cell line or a COS cell line. In apreferred embodiment, the cell is a non-transformed cell. In a preferredembodiment, the cell can be from a clonal cell strain. In variouspreferred embodiments, the cell is a mammalian cell, e.g., a primary orsecondary mammalian cell, e.g., a fibroblast, a hematopoietic stem cell,a myoblast, a keratinocyte, an epithelial cell, an endothelial cell, aglial cell, a neural cell, a cell comprising a formed element of theblood, a muscle cell and precursors of these somatic cells. In a mostpreferred embodiment, the cell is a secondary human fibroblast.

In a preferred embodiment, the synthetic nucleic acid sequence whichencodes a factor VIII protein has at least one, preferably at least two,and most preferably, all of the characteristics a, b, and c describedabove.

In preferred embodiments, at least one non-common codon or less-commoncodon of the synthetic nucleic acid has been replaced by a common codonand the synthetic nucleic acid has one or more of the followingproperties: it has a continuous stretch of at least 90 codons all ofwhich are common codons; it has a continuous stretch of common codonswhich comprise at least 33% of the codons of the synthetic nucleic acidsequence; at least 94% or more of the codons in the sequence encodingthe protein are common codons and the synthetic nucleic acid sequenceencodes a protein of at least about 90, 100, or 120 amino acids inlength; it is at least 80 base pairs in length and is free of uniquerestriction endonuclease sites that would occur in the message optimizedsequence.

In a preferred embodiment, the number of non-common or less-commoncodons replaced is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In a preferred embodiment, the number of non-common or less-commoncodons remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In preferred embodiments, the non-common and less-common codonsreplaced, taken together, are equal to or less then 6%, 5%, 4%, 3%, 2%,1% of the codons in the synthetic nucleic acid sequence.

In preferred embodiments, the non-common and less-common codonsremaining, taken together, are equal to or less then 6%, 5%, 4%, 3%, 2%,1% of the codons in the synthetic nucleic acid sequence.

In a preferred embodiment, all non-common or less-common codons arereplaced with common codons.

In a preferred embodiment, all non-common and less-common codons arereplaced with common codons.

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%,or all of the codons in the synthetic nucleic acid sequence are commoncodons.

Preferably, all of the codons in the synthetic nucleic acid sequence arecommon codons.

In preferred embodiments, the protein is expressed in a eukaryotic cell,e.g., a mammalian cell, e.g., a human cell, and the protein is amammalian protein, e.g., a human protein.

In a preferred embodiment, the synthetic nucleic acid sequence includesa continuous stretch of common codons wherein the continuous stretchcomprises at least 35%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 100% ofcodons in the synthetic nucleic acid sequence.

In another aspect, the invention features, a synthetic nucleic acidsequence which can direct the synthesis of an optimized message whichencodes a Factor IX protein having one or more of the followingcharacteristics:

a) it has a PACE/furin, such as a X-Arg-X-X-Arg site, at a pro-peptidemature protein junction; or

b) is inserted, e.g., via transfection, into a non-transformed cell,e.g., a primary or secondary cell, e.g., a primary human fibroblast.

In a preferred embodiment, the synthetic nucleic acid sequence whichencodes a factor IX protein has at least one, and preferably, both ofthe characteristics a) and b) described above.

In preferred embodiments, at least one non-common codon or less-commoncodon of the synthetic nucleic acid has been replaced by a common codonand the synthetic nucleic acid has one or more of the followingproperties: it has a continuous stretch of at least 90 codons all ofwhich are common codons; it has a continuous stretch of common codonswhich comprise at least 33% of the codons of the synthetic nucleic acidsequence; at least 94% or more of the codons in the sequence encodingthe protein are common codons and the synthetic nucleic acid sequenceencodes a protein of at least about 90, 100, or 120 amino acids inlength; it is at least 80 base pairs in length and is free of uniquerestriction endonuclease sites that occur in the message optimizedsequence.

In a preferred embodiment, the number of non-common or less-commoncodons replaced is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In a preferred embodiment, the number of non-common or less-commoncodons remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In preferred embodiments, the non-common and less-common codonsreplaced, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In preferred embodiments, the non-common and less-common codonsremaining, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In a preferred embodiment, all non-common or less-common codons arereplaced with common codons.

In a preferred embodiment, all non-common and less-common codons arereplaced with common codons.

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%,or all of the codons in the synthetic nucleic acid sequence are commoncodons.

Preferably, all of the codons in the synthetic nucleic acid sequence arecommon codons.

In preferred embodiments, the protein is expressed in a eukaryotic cell,e.g., a mammalian cell, e.g., a human cell, and the protein is amammalian protein, e.g., a human protein.

In a preferred embodiment, the synthetic nucleic acid sequence includesa continuous stretch of common codons wherein the continuous stretchcomprises at least 35%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 100% ofcodons in the synthetic nucleic acid sequence.

In another aspect, the invention features a synthetic nucleic acidsequence which can direct the synthesis of an optimized message whichencodes I-galactosidase.

In a preferred embodiment, the synthetic nucleic acid sequence whichencodes I-galactosidase is inserted, e.g., via transfection, into anon-transformed cell, e.g., a primary or secondary cell, e.g., a primaryhuman fibroblast.

In preferred embodiments, at least one non-common codon or less-commoncodon of the synthetic nucleic acid has been replaced by a common codonand the synthetic nucleic acid has one or more of the followingproperties: it has a continuous stretch of at least 90 codons all ofwhich are common codons; it has a continuous stretch of common codonswhich comprise at least 33% of the codons of the synthetic nucleic acidsequence; at least 94% or more of the codons in the sequence encodingthe protein are common codons and the synthetic nucleic acid sequenceencodes a protein of at least about 90, 100, or 120 amino acids inlength; it is at least 80 base pairs in length and is free of uniquerestriction endonuclease sites that occur in the message optimizedsequence.

In a preferred embodiment, the number of non-common or less-commoncodons replaced is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In a preferred embodiment, the number of non-common or less-commoncodons remaining is less than 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4,3, 2 or 1.

In preferred embodiments, the non-common and less-common codonsreplaced, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In preferred embodiments, the non-common and less-common codonsremaining, taken together, are equal or less then 6%, 5%, 4%, 3%, 2%, 1%of the codons in the synthetic nucleic acid sequence.

In a preferred embodiment, all non-common or less-common codons arereplaced with common codons.

In a preferred embodiment, all non-common and less-common codons arereplaced with common codons.

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%,or all of the codons in the synthetic nucleic acid sequence are commoncodons.

Preferably, all of the codons in the synthetic nucleic acid sequence arecommon codons.

In preferred embodiments, the protein is expressed in a eukaryotic cell,e.g., a mammalian cell, e.g., a human cell, and the protein is amammalian protein, e.g., a human protein.

In a preferred embodiment, the synthetic nucleic acid sequence includesa continuous stretch of common codons wherein the continuous stretchcomprises at least 35%, 40%, 50%, 60%, 70%, 80%, 90%, 95% or 100% ofcodons in the synthetic nucleic acid sequence.

In another aspect, the invention features, a plasmid or a DNA construct,e.g., an expression plasmid or a DNA construct, which includes asynthetic nucleic acid sequence described herein.

In yet another aspect, the invention features, a synthetic nucleic acidsequence described herein introduced into the genome of an animal cell.In a preferred embodiment, the animal cell is a primate cell, e.g., amammal cell, e.g., a human cell.

In still another aspect, the invention features, a cell harboring asynthetic nucleic acid sequence described herein, e.g., a cell from aprimary or secondary cell strain, or a cell from a continuous cell line,e.g., a Bowes Melanoma cell (ATCC Accession No. CRL 9607), a Daudi cell(ATCC Accession No. CCL 213), a HeLa cell and a derivative of a HeLacell (ATCC Accession Nos. CCL 2, CCL2.1, and CCL 2.2), a HL-60 cell(ATCC Accession No. CCL 240), a HT-1080 cell (ATCC Accession No. CCL121), a Jurkat cell (ATCC Accession No. TIB 152), a KB carcinoma cell(ATCC Accession No. CCL 17), a K-562 leukemia cell (ATCC Accession No.CCL 243), a MCF-7 breast cancer cell (ATCC Accession No. BTH 22), aMOLT-4 cell (ATCC Accession No. 1582), a Namalwa cell (ATCC AccessionNo. CRL 1432), a Raji cell (ATCC Accession No. CCL 86), a RPMI 8226 cell(ATCC Accession No. CCL 155), a U-937 cell (ATCC Accession No. CRL1593), a WI-38VA13 sub line 2R4 cell (ATCC Accession No. CLL 75.1), aCCRF-CEM cell (ATCC Accession No. CCL 119) and a 2780AD ovariancarcinoma cell (Van Der Blick et al., Cancer Res. 48: 5927-5932, 1988),as well as heterohybridoma cells produced by fusion of human cells andcells of another species. In another embodiment, the immortalized cellline can be a cell line other than a human cell line, e.g., a CHO cellline or a COS cell line. In a preferred embodiment, the cell is anon-transformed cell. In a preferred embodiment, the cell is from aclonal cell strain. In various preferred embodiments, the cell is amammalian cell, e.g., a primary or secondary mammalian cell, e.g., afibroblast, a hematopoietic stem cell, a myoblast, a keratinocyte, anepithelial cell, an endothelial cell, a glial cell, a neural cell, acell comprising a formed element of the blood, a muscle cell andprecursors of these somatic cells. In a most preferred embodiment, thecell is a secondary human fibroblast.

In another aspect, the invention features, a method for preparing asynthetic nucleic acid sequence encoding a protein which is, preferably,at least 90 codons in length, e.g., a synthetic nucleic acid sequencedescribed herein. The method includes identifying non-common andless-common codons in the non-optimized gene encoding the protein andreplacing at least, 94%, 95%, 96%, 97%, 98%, 99% or more of thenon-common and less-common codons with a common codon encoding the sameamino acid as the replaced codon. Preferably, all non-common andless-common codons are replaced with common codons.

In a preferred embodiment, the synthetic nucleic acid sequence encodes aprotein of at least about 90, 95, 100, 105, 110, 120, 130, 150, 200,500, 700, 1000 or more codons in length.

In preferred embodiments, the protein is expressed in a eukaryotic cell,e.g., a mammalian cell, e.g., a human cell, and the protein is amammalian protein, e.g., a human protein.

In another aspect, the invention features, a method for making a nucleicacid sequence which directs the synthesis of a optimized message of aprotein of at least 90, 100, or 120 amino acids in length, e.g., asynthetic nucleic acid sequence described herein. The method includes:synthesizing at least two fragments of the nucleic acid sequence,wherein the two fragments encode adjoining portions of the protein andwherein both fragments are mRNA optimized, e.g., as described herein;and joining the two fragments such that a non-common codon is notcreated at a junction point, thereby making the mRNA optimized nucleicacid sequence.

In a preferred embodiment, the two fragments are joined together suchthat a unique restriction endonuclease site used to create the twofragments is not recreated at the junction point. In another preferredembodiment, the two fragments are joined together such that a uniquerestriction site is created.

In a preferred embodiment, the synthetic nucleic acid sequence encodes aprotein of at least about 90, 95, 100, 105, 110, 120, 130, 150, 200,500, 700, 1000 or more codons in length.

In a preferred embodiment, at least 3, 4, 5, 6, 7, 8, 9, 10 or morefragments of the nucleic acid sequence are synthesized.

In a preferred embodiment, the fragments are joined together by afusion, e.g., a blunt end fusion.

In various preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%,or all of the codons in the synthetic nucleic acid sequence are commoncodons. Preferably, all of the codons in the synthetic nucleic acidsequence are common codons.

In preferred embodiments, the number of codons which are not commoncodons is equal to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1.

In preferred embodiments, each fragment is at least 30, 40, 50, 75, 100,120, 150 or more codons in length.

In another aspect, the invention features, a method of providing asubject, e.g., a human, with a protein. The methods includes: providinga synthetic nucleic acid sequence that can direct the synthesis of anoptimized message for a protein, e.g., a synthetic nucleic acid sequencedescribed herein; introducing the synthetic nucleic acid sequence thatdirects the synthesis of an optimized message for a protein into thesubject; and allowing the subject to express the protein, therebyproviding the subject with the protein.

In preferred embodiments, the method further includes inserting thenucleic acid sequence that can direct the synthesis of an optimizedmessage into a cell. The cell can be an autologous, allogeneic, orxenogeneic cell, but is preferably autologous. A preferred cell is afibroblast, a hematopoietic stem cell, a myoblast, a keratinocyte, anepithelial cell, an endothelial cell, a glial cell, a neural cell, acell comprising a formed element of the blood, a muscle cell andprecursors of these somatic cells. The mRNA optimized synthetic nucleicacid sequence can be inserted into the cell ex vivo or in vivo. Ifinserted ex vivo, the cell can be introduced into the subject.

In preferred embodiments, at least 94%, 95%, 96%, 97%, 98%, 99%, or allof the codons in the synthetic nucleic acid sequence are common codons.Preferably, all of the codons in the synthetic nucleic acid sequence arecommon codons.

In preferred embodiments, the number of codons which are not commoncodons is equal to or less than 15, 10, 9, 8, 7, 6, 5, 4, 3, 2, or 1.

The invention also features synthetic nucleic acid fragments whichencode a portion of a protein. Such synthetic nucleic acid fragments aresimilar to the synthetic nucleic acid sequences of the invention exceptthat they encode only a portion of a protein. Such nucleic acidfragments preferably encode at least 50, 60, 70, 80, 100, 110, 120, 130,150, 200, 300, 400, 500, or more contiguous amino acids of the protein.

The invention also features transfected or infected primary andsecondary somatic cells of vertebrate origin, particularly of mammalianorigin, e.g., of human, mouse, or rabbit origins, e.g., primary humancells, secondary human cells, or primary or secondary rabbit cells. Thecells are transfected or infected with exogenous synthetic nucleic acid,e.g., DNA, described herein. The synthetic nucleic acid can encode aprotein, e.g., a therapeutic protein, e.g., an enzyme, e.g.,I-galactosidase, a cytokine, a hormone, an antigen, an antibody, aclotting factor, e.g., Factor VIII, Factor IX, or a regulatory protein.The invention also includes methods by which primary and secondary cellsare transfected or infected to include exogenous synthetic DNA, methodsof producing clonal cell strains or heterogenous cell strains, andmethods of gene therapy in which the transfected or infected primary orsecondary cells are used. The synthetic nucleic acid directs thesynthesis of an optimized message, e.g., an optimized message asdescribed herein.

The present invention includes primary and secondary somatic cells,which have been transfected or infected with an exogenous syntheticnucleic acid described herein, which is stably integrated into theirgenomes or is expressed in the cells episomally. In preferredembodiments the cells are fibroblasts, keratinocytes, epithelial cells,endothelial cells, glial cells, neural cells, cells comprising a formedelement of the blood, muscle cells, other somatic cells which can becultured, or somatic cell precursors. The resulting cells are referredto, respectively, as transfected or infected primary cells andtransfected or infected secondary cells. The exogenous synthetic DNAencodes a protein, or a portion thereof, e.g., a therapeutic protein(e.g., Factor VIII or Factor IX). In the embodiment in which theexogenous synthetic DNA encodes a protein, or a portion thereof, to beexpressed by the recipient cells, the resulting protein can be retainedwithin the cell, incorporated into the cell membrane or secreted fromthe cell. In this embodiment, the exogenous synthetic DNA encoding theprotein is introduced into cells along with additional DNA sequencessufficient for expression of the exogenous synthetic DNA in the cells.The additional DNA sequences may be of viral or non-viral origin.Primary cells modified to express exogenous synthetic DNA are referredto herein as transfected or infected primary cells, which include cellsremoved from tissue and placed on culture medium for the first time.Secondary cells modified to express or render available exogenous DNAare referred to herein as transfected or infected secondary cells.

Primary and secondary cells transfected or infected by the subjectmethod, e.g., cloned cell strains, can be seen to fall into three typesor categories: 1) cells which do not, as obtained, make or contain thetherapeutic protein, 2) cells which make or contain the therapeuticprotein but in lower quantities than normal (in quantities less than thephysiologically normal lower level) or in defective form, and 3) cellswhich make the therapeutic protein at physiologically normal levels, butare to be augmented or enhanced in their content or production. Examplesof proteins that can be made by the present method include cytokines orclotting factors.

Exogenous synthetic DNA is introduced into primary or secondary cell bya variety of techniques. For example, a DNA construct which includesexogenous synthetic DNA encoding a therapeutic protein and additionalDNA sequences necessary for expression in recipient cells can beintroduced into primary or secondary cells by electroporation,microinjection, or other means (e.g., calcium phosphate precipitation,modified calcium phosphate precipitation, polybrene precipitation,liposome fusion, receptor-mediated DNA delivery). Alternatively, avector, such as a retroviral or other vector which includes exogenoussynthetic DNA can be used and cells can be genetically modified as aresult of infection with the vector.

In addition to the exogenous synthetic DNA, transfected or infectedprimary and secondary cells may optionally contain DNA encoding aselectable marker, which is expressed and confers upon recipients aselectable phenotype, such as antibiotic resistance, resistance to acytotoxic agent, nutritional prototrophy or expression of a surfaceprotein. Its presence makes it possible to identify and select cellscontaining the exogenous DNA. A variety of selectable marker genes canbe used, such as neo, gpt, dhfr, ada, pac, hyg, mdr and hisD.

Transfected or infected cells of the present invention are useful, aspopulations of transfected or infected primary cells or secondary cells,transfected or infected clonal cell strains, transfected or infectedheterogenous cell strains, and as cell mixtures in which at least onerepresentative cell of one of the three preceding categories oftransfected or infected cells is present, (e.g., the mixture of cellscontains essentially transfected or infected primary or secondary cellsand may include untransfected or uninfected primary or secondary cells)as a delivery system for treating an individual with an abnormal orundesirable condition which responds to delivery of a therapeuticprotein, which is either: 1) a therapeutic protein (e.g., a proteinwhich is absent, underproduced relative to the individual's physiologicneeds, defective, or inefficiently or inappropriately utilized in theindividual, e.g., Factor VIII or Factor IX; or 2) a therapeutic proteinwith novel functions, such as enzymatic or transport functions such asI-galactosidase. In the method of the present invention of providing atherapeutic protein, transfected or infected primary cells or secondarycells, clonal cell strains or heterogenous cell strains, areadministered to an individual in whom the abnormal or undesirablecondition is to be treated or prevented, in sufficient quantity and byan appropriate route, to express the exogenous synthetic DNA atphysiologically relevant levels. A physiologically relevant level is onewhich either approximates the level at which the product is produced inthe body or results in improvement of the abnormal or undesirablecondition.

Clonal cell strains of transfected or infected secondary cells (referredto as transfected or infected clonal cell strains) expressing exogenoussynthetic DNA (and, optionally, including a selectable marker gene) canbe produced by the method of the present invention. The method includesthe steps of: 1) providing a population of primary cells, obtained fromthe individual to whom the transfected or infected primary cells will beadministered or from another source; 2) introducing into the primarycells or into secondary cells derived from primary cells a DNA constructwhich includes exogenous DNA as described above and the necessaryadditional DNA sequences described above, producing transfected orinfected primary or secondary cells; 3) maintaining transfected orinfected primary or secondary cells under conditions appropriate fortheir propagation; 4) identifying a transfected or infected primary orsecondary cell; and 5) producing a colony from the transfected orinfected primary or secondary cell identified in (4) by maintaining itunder appropriate culture conditions until a desired number of cells isobtained. The desired number of clonal cells is a number sufficient toprovide a therapeutically effective amount of product when administeredto an individual, e.g., an individual with hemophilia A is provided witha population of cells that produce a therapeutically effective amount ofFactor VIII, such that that the condition is treated. The individual canalso be, for example, an individual with hemophilia B or an individualwith a deficiency of I-galactosidase such as an individual with Fabrydisease. The number of cells required for a given therapeutic dosedepends on several factors including the expression level of theprotein, the condition of the host animal and the limitations associatedwith the implantation procedure. In general, the number of cellsrequired for implantation is in the range of 1×10⁶ to 5×10⁹, andpreferably 1×10⁸ to 5×10⁸. In one embodiment of the method, the cellidentified in (4) undergoes approximately 27 doublings (i.e., undergoes27 cycles of cell growth and cell division) to produce 100 millionclonal transfected or infected cells. In another embodiment of themethod, exogenous synthetic DNA is introduced into genomic DNA byhomologous recombination between DNA sequences present in the DNAconstruct and genomic DNA. In another embodiment, the exogenoussynthetic DNA is present episomally in a transfected cell, e.g., primaryor secondary cell.

In one embodiment of producing a clonal population of transfectedsecondary cells, a cell suspension containing primary or secondary cellsis combined with exogenous synthetic DNA encoding a therapeutic proteinand DNA encoding a selectable marker, such as the neo gene. The two DNAsequences are present on the same DNA construct or on two separate DNAconstructs. The resulting combination is subjected to electroporation,generally at 250-300 volts with a capacitance of 960 μFarads and anappropriate time constant (e.g., 14 to 20 m sec) for cells to take upthe DNA construct. In an alternative embodiment, microinjection is usedto introduce the DNA construct into primary or secondary cells. Ineither embodiment, introduction of the exogenous DNA results inproduction of transfected primary or secondary cells. The exogenoussynthetic DNA introduced into the cell can be stably integrated intogenomic DNA or is present episomally in the cell.

In the method of producing heterogenous cell strains of the presentinvention, the same steps are carried out as described for production ofa clonal cell strain, except that a single transfected primary orsecondary cell is not isolated and used as the founder cell. Instead,two or more transfected primary or secondary cells are cultured toproduce a heterogenous cell strain. A heterogenous cell strain can alsocontain in addition to two or more transfected primary or secondarycells, untransfected primary or secondary cells.

The methods described herein have wide applicability in treatingabnormal or undesired conditions and can be used to provide a variety ofproteins in an effective amount to an individual. For example, they canbe used to provide secreted proteins (with either predominantly systemicor predominantly local effects, e.g., Factor VIII and Factor IX),membrane proteins (e.g., for imparting new or enhanced cellularresponsiveness, facilitating removal of a toxic product or for markingor targeting to a cell) or intracellular proteins (e.g., for affectinggene expression or producing autocrine effects).

A method described herein is particularly advantageous in treatingabnormal or undesired conditions in that it: 1) is curative (one genetherapy treatment has the potential to last a patient's lifetime); 2)allows precise dosing (the patient's cells continuously determine anddeliver the optimal dose of the required protein based on physiologicdemands, and the stably transfected or infected cell strains can becharacterized extensively in vitro prior to implantation, leading toaccurate predictions of long term function in vivo); 3) is simple toapply in treating patients; 4) eliminates issues concerning patientcompliance (following a one-time gene therapy treatment, daily proteininjections are no longer necessary); and 5) reduces treatment costs(since the therapeutic protein is synthesized by the patient's owncells, investment in costly protein production and purification isunnecessary).

As used herein, the term “optimized messenger RNA” refers to a syntheticnucleic acid sequence encoding a protein wherein at least one non-commoncodon or less-common codon in the sequence encoding the protein has beenreplaced with a common codon.

By “common codon” is meant the most common codon representing aparticular amino acid in a human sequence. The codon frequency in highlyexpressed human genes is outlined below in Table 1. Common codonsinclude: Ala (gcc); Arg (cgc); Asn (aac); Asp (gac); Cys (tgc); Gln(cag); Gly (ggc); His (cac); Ile (atc); Leu (ctg); Lys (aag); Pro (ccc);Phe (ttc); Ser (agc); Thr (acc); Tyr (tac); Glu (gag); and Val (gtg)(see Table 1). “Less-common codons” are codons that occurs frequently inhumans but are not the common codon: Gly (ggg); Be (att); Leu (etc); Ser(tcc); Val (gtc); and Arg (agg). All codons other than common codons andless-common codons are “non-common codons”.

TABLE 1 Codon Frequency in Highly Expressed Human Genes % occurrence AlaGC C 53 T 17 A 13 G 17 Arg CG C 37 T 7 A 6 G 21 AG A 10 G 18 Asn AA C 78T 25 Leu CT C 26 T 5 A 3 G 58 TT A 2 G 6 Lys AA A 18 G 82 Pro CC C 48 T19 A 16 G 17 Phe TT C 80 T 20 Cys TG C 68 T 32 Gln CA A 12 G 88 Glu GA A25 G 75 Gly GG C 50 T 12 A 14 G 24 His CA C 79 T 21 Ilc AT C 77 T 18 A 5Ser TC C 28 T 13 A 5 G 9 AG C 34 T 10 Thr AC C 57 T 14 A 14 G 15 Tyr TAC 74 T 26 Val GT C 25 T 7 A 5 G 64

Codon frequency in Table 1 was calculated using the GCG programestablished by the University of Wisconsin Genetics Computer Group.Numbers represent the percentage of cases in which the particular codonis used.

The term “primary cell” includes cells present in a suspension of cellsisolated from a vertebrate tissue source (prior to their being platedi.e., attached to a tissue culture substrate such as a dish or flask),cells present in an explant derived from tissue, both of the previoustypes of cells plated for the first time, and cell suspensions derivedfrom these plated cells. The term secondary cell or cell strain refersto cells at all subsequent steps in culturing. That is, the first time aplated primary cell is removed from the culture substrate and replated(passaged), it is referred to herein as a secondary cell, as are allcells in subsequent passages. Secondary cells are cell strains whichconsist of secondary cells which have been passaged one or more times. Acell strain consists of secondary cells that: 1) have been passaged oneor more times; 2) exhibit a finite number of mean population doublingsin culture; 3) exhibit the properties of contact-inhibited, anchoragedependent growth (anchorage-dependence does not apply to cells that arepropagated in suspension culture); and 4) are not immortalized. A“clonal cell strain” is defined as a cell strain that is derived from asingle founder cell. A “heterogenous cell strain” is defined as a cellstrain that is derived from two or more founder cells.

The term “transfected cell” refers to a cell into which an exogenoussynthetic nucleic acid sequence, e.g., a sequence which encodes aprotein, is introduced. Once in the cell, the synthetic nucleic acidsequence can integrate into the recipients cells chromosomal DNA or canexist episomally. Standard transfection methods can be used to introducethe synthetic nucleic acid sequence into a cell, e.g., transfectionmediated by liposome, polybrene, DEAE dextran-mediated transfection,electroporation, calcium phosphate precipitation or microinjection. Theterm “transfection” does not include delivery of DNA or RNA into a cellby a virus The term “infected cell” refers to a cell into which anexogenous synthetic nucleic acid sequence, e.g., a sequence whichencodes a protein, is introduced by a virus. Viruses known to be usefulfor gene transfer include an adenovirus, an adeno-associated virus, aherpes virus, a mumps virus, a poliovirus, a retrovirus, a Sindbisvirus, a lentivirus and a vaccinia virus such as a canary pox virus.Other features and advantages of the invention will be apparent from thefollowing detailed description and the claims.

DETAILED DESCRIPTION OF THE INVENTION

The drawings are first briefly described.

FIG. 1 is a schematic representation of domain structures of full-lengthand B-domain deleted human Factor VIII (hFVIII).

FIG. 2 is a schematic representation of full-length hFVIII.

FIG. 3 is a schematic representation of 5R BDD hFVIII expression plasmidpXF8.186.

FIG. 4 is a schematic representation of LE BDD hFVIII expression plasmidpXF8.61.

FIG. 5 is a schematic representation of the fourteen fragments(Fragments A-Fragment N) assembled to construct pXF8.61. (Coding andnon-coding strands are SEQ ID NOs:107-120 and 121-134, respectively).

FIG. 6 is a schematic representation of the assembly of pXF8.61.

FIG. 7 depicts the nucleotide sequence and the corresponding amino acidsequence of the LE B-domain-deleted-Factor VIII (FVIII) insert containedin pAM1-1 (SEQ ID NOs:1 and 3, respectively).

FIG. 8 is a schematic representation of the fragments assembled toconstruct pXF8.186. (Coding and non-coding strands are SEQ ID NOs:135and 136, respectively).

FIG. 9 depicts the nucleotide sequence and the corresponding amino acidsequence of the 5Arg B-domain-deleted-FVIII insert (SEQ ID NOs:2 and 4,respectively).

FIG. 10 is a schematic representation of the Factor VIII expressionplasmid, pXF8.36. The cytomegalovirus immediate early I (CMV) promoteris depicted as a lightly shaded box. Positions of splice donor (SD) andsplice acceptor (SA) sites are indicated below the shaded box. TheFactor VIII cDNA sequence is depicted as a solid dark box. The hGH 3′UTSregion is depicted as an open box. The new expression cassette isdepicted as a shaded box with an arrowhead which corresponds to thedirection of transcription. The thin dark line represents the plasmidbackbone sequences. The position and direction of transcription of theβ-lactamase gene (amp) is indicated by the solid boxed arrow.

FIG. 11 is a schematic representation of the Factor VIII expressionplasmid, pXF8.38. The cytomegalovirus immediate early I (CMV) promoteris depicted as a lightly shaded box. Positions of splice donor (SD) andsplice acceptor (SA) sites are indicated below the shaded box. TheFactor VIII cDNA sequence is depicted as a solid dark box. The hGH 3′UTSregion is depicted as an open box. The neo expression cassette isdepicted as a shaded box with an arrowhead which corresponds to thedirection of transcription. The thin dark line represents the plasmidbackbone sequences. The position and direction of transcription of theβ-lactamase gene (amp) is indicated by the solid boxed arrow.

FIG. 12 is a schematic representation of the Factor VIII expressionplasmid, pXF8.269. The collagen (I) α2 promoter is depicted as a stripedbox. The region representing aldolase-derived 5′ untranslated sequencesis depicted as a lightly shaded box. Positions of splice donor (SD) andsplice acceptor (SA) sites are indicated below the shaded box. TheFactor VIII cDNA sequence is depicted as a solid dark box. The hGH 3′UTSregion is depicted as an open box. The neo expression cassette isdepicted as a shaded box with an arrowhead which corresponds to thedirection of transcription. The thin dark line represents the plasmidbackbone sequences. The position and direction of transcription of theβ-lactamase gene (amp) is indicated by the solid boxed arrow.

FIG. 13 is a schematic representation of the Factor VIII expressionplasmid, pXF8.224. The collagen (I) α2 promoter is depicted as a stripedbox. The region representing aldolase-derived 5′ untranslated sequencesis depicted as a lightly shaded box. Positions of splice donor (SD) andsplice acceptor (SA) sites are indicated below the shaded box. TheFactor VIII cDNA sequence is depicted as a solid dark box. The hGH 3′UTSregion is depicted as an open box. The neo expression cassette isdepicted as a shaded box with an arrowhead which corresponds to thedirection of transcription. The thin dark line represents the plasmidbackbone sequences. The position and direction of transcription of theβ-lactamase gene (amp) is indicated by the solid boxed arrow.

FIG. 14 is a schematic representation of the fragments assembled toconstruct pFIXABCD. The restriction sites that are cut are in bold andthe junctions from the last step are underlines. The direction oftranscription of the FIXABCD sequence is indicated by the solid blackarrow.

FIG. 15 depicts the nucleotide sequence of the FIXABCD insert (SEQ IDNO:105).

FIG. 16 is a schematic representation of the Factor IX expressionplasmids pXIX76 and pXIX170. The arrows inside the circle denote openreading frames. Arrows on the circle denote promoter sequences; a doubleheaded arrow denotes an enhancer. Thin lines denote bacterial vectorsequences or introns and thick boxes delineate the translated sequence.Double lines denote untranscribed genomic sequences, while lines ofintermediate thickness denote untranslated portions of the mRNA. PlasmidpXIX170 has a Factor IX cDNA sequence that is optimized, while pXIX76does not.

FIG. 17 depicts the nucleotide sequence of the I-galactosidase insertSEQ ID NO:106).

FIG. 18 is a schematic representation of the I-galactosidase expressionplasmids pXAG94 and pXAG95. The arrows inside the circle denote openreading frames. Arrows on the circle denote promoter sequences; a doubleheaded arrow denotes an enhancer. Thin lines denote bacterial vectorsequences or introns and thick boxes delineate the translated sequence.Double lines denote untranscribed genomic sequences, while lines ofintermediate thickness denote untranslated portions of the mRNA. PlasmidpXAG95 has an I-galactosidase cDNA sequence that is optimized, whilepXAG94 does not.

FIG. 19 is a schematic representation of the I-galactosidase expressionplasmids pXAG73 and pXAG74. The arrows inside the circle denote openreading frames. Arrows on the circle denote promoter sequences; a doubleheaded arrow denotes an enhancer. Thin lines denote bacterial vectorsequences or introns and thick boxes delineate the translated sequence.Double lines denote untranscribed genomic sequences, while lines ofintermediate thickness denote untranslated portions of the mRNA. PlasmidpXAG74 has an I-galactosidase cDNA sequence that is optimized, whilepXAG73 does not.

MESSAGE OPTIMIZATION

Methods of the invention are directed to optimized messages andsynthetic nucleic acid sequences which direct the production ofoptimized mRNAs. An optimized mRNA can direct the synthesis of a proteinof interest, e.g., a human protein, e.g. a human Factor VIII, humanFacto IX or human I-galactosidase. A message for a protein of interest,e.g., human Factor VIII, human Factor IX or human I-galactosidase, canbe optimized as described herein, e.g., by replacing at least 94%, 95%,96%, 97%, 98%, 99%, and preferably all of the non-common codons orless-common codons with a common codon encoding the same amino acid asoutlined in Table 1.

The coding region of a synthetic nucleic acid sequence can include thesequence “cg” without any discrimination, if the sequence is found inthe common codon for that amino acid. Alternatively, the sequence “cg”can be limited in various regions, e.g., the first 20% of the codingsequence can be designed to have a low incidence of the sequence “cg”.

Optimizing a message (and its synthetic DNA sequence) can negatively orpositively affect gene expression or protein production. For example,replacing a less-common codon with a more common codon may affect thehalf-life of the mRNA or alter its structure by introducing a secondarystructure that interferes with translation of the message. It maytherefore be necessary, in certain instances, to alter the optimizedmessage.

All or a portion of a message (or its gene) can be optimized. In somecases the desired modulation of expression is achieved by optimizingessentially the entire message. In other cases, the desired modulationwill be achieved by optimizing part but not all of the message or gene.

The codon usage of any coding sequence can be adjusted to achieve adesired property, for example high levels of expression in a specificcell type. The starting point for such an optimization may be a codingsequence with 100% common codons, or a coding sequence which contains amixture of common and non-common codons.

Two or more candidate sequences that differ in their codon usage aregenerated and tested to determine if they possess the desired property.Candidate sequences may be evaluated initially by using a computer tosearch for the presence of regulatory elements, such as silencers orenhancers, and to search for the presence of regions of coding sequencewhich could be converted into such regulatory elements by an alterationin codon usage. Additional criteria may include enrichment forparticular nucleotides, e.g., A, C, G or U, codon bias for a particularamino acid, or the presence or absence of particular mRNA secondary ortertiary structure. Adjustment to the candidate sequence can be madebased on a number of such criteria.

Promising candidate sequences are constructed and then evaluatedexperimentally. Multiple candidates may be evaluated independently ofeach other, or the process can be iterative, either by using the mostpromising candidate as a new starting point, or by combining regions oftwo or more candidates to produce a novel hybrid. Further rounds ofmodification and evaluation can be included.

Modifying the codon usage of a candidate sequence can result in thecreation or destruction of either a positive or negative element. Ingeneral, a positive element refers to any element whose alteration orremoval from the candidate sequence could result in a decrease inexpression of the therapeutic protein, or whose creation could result inan increase in expression of a therapeutic protein. For example, apositive element can include an enhancer, a promoter, a downstreampromoter element, a DNA binding site for a positive regulator (e.g., atranscriptional activator), or a sequence responsible for imparting orremoving mRNA secondary or tertiary structure. A negative element refersto any element whose alteration or removal from the candidate sequencecould result in an increase in expression of the therapeutic protein, orwhose creation would result in a decrease in expression of thetherapeutic protein. A negative element includes a silencer, a DNAbinding site for a negative regulator (e.g., a transcriptionalrepressor), a transcriptional pause site, or a sequence that isresponsible for imparting or removing mRNA secondary or tertiarystructure. In general, a negative element arises more frequently than apositive element. Thus, any change in codon usage that results in anincrease in protein expression is more likely to have arisen from thedestruction of a negative element rather than the creation of a positiveelement. In addition, alteration of the candidate sequence is morelikely to destroy a positive element than create a positive element. Inone embodiment, a candidate sequence is chosen and modified so as toincrease the production of a therapeutic protein. The candidate sequencecan be modified, e.g., by sequentially altering the codons or byrandomly altering the codons in the candidate sequence. A modifiedcandidate sequence is then evaluated by determining the level ofexpression of the resulting therapeutic protein or by evaluating anotherparameter, e.g., a parameter correlated to the level of expression. Acandidate sequence which produces an increased level of a therapeuticprotein as compared to an unaltered candidate sequence is chosen.

In another approach, one or a group of codons can be modified, e.g.,without reference to protein or message structure and tested.Alternatively, one or more codons can be chosen on a message-levelproperty, e.g., location in a region of predetermined, e.g., high orlow, GC or AU content, location in a region having a structure such asan enhancer or silencer, location in a region that can be modified tointroduce a structure such as an enhancer or silencer, location in aregion having, or predicted to have, secondary or tertiary structure,e.g., intra-chain pairing, inter-chain pairing, location in a regionlacking, or predicted to lack, secondary or tertiary structure, e.g.,intra-chain or inter-chain pairing. A particular modified region ischosen if it produces the desired result.

Methods which systematically generate candidate sequences are useful.For example, one or a group, e.g., a contiguous block of codons, atvarious positions of a synthetic nucleic acid sequence can be replacedwith common codons (or with non common codons, if for example, thestarting sequence has been optimized) and the resulting sequenceevaluated. Candidates can be generated by optimizing (or de-optimizing)a given “window” of codons in the sequence to generate a firstcandidate, and then moving the window to a new position in the sequence,and optimizing (or de-optimizing) the codons in the new position underthe window to provide a second candidate. Candidates can be evaluated bydetermining the level of expression they provide, or by evaluatinganother parameter, e.g., a parameter correlated to the level ofexpression. Some parameters can be evaluated by inspection orcomputationally, e.g., the possession or lack thereof of high or low GCor AU content; a sequence element such as an enhancer or silencer;secondary or tertiary structure, e.g., intra-chain or inter-chain paring

Thus, hybrid messages, i.e., messages having a region which is optimizedand a region which is not optimized, can be evaluated to determine ifthey have a desired property. The evaluation can be effected by, e.g.,synthesizing the candidate message or messages, and determining aproperty such as its level of expression. Such a determination can bemade in a cell-free system or in a cell-based system. The generation andtesting of one or more candidates can also be performed, bycomputational methods, e.g., on a computer. For example, a computerprogram can be used to generate a number of candidate messages and thosemessages analyzed by a computer program which predicts the existence ofprimary structure elements or secondary or tertiary structure.

A candidate message can be generated by dividing a region intosubregions and optimizing each subregion. An optimized subregion is thencombined with a non-optimized subregion to produce a candidate. Forexample, a region is divided into three subregions, a, b and c, each ofwhich is then optimized to provide optimized subregions a′, b′ and c′.The optimized subregions, a′, b′, and c′ can then be combined with oneor more of the non-optimized subregions, e.g., a, b and c. For example,ab′c could be formed and tested. Different combinations of optimized andnon-optimized subregions can be generated. By evaluating a series ofsuch hybrid candidate sequences, it is possible to analyze the effect ofmodification of different subregions and, e.g., to define the particularversion of each subregion that contributes most to the desired property.A preferred candidate can include the versions of each subregion thatperformed best in a series of such experiments.

An algorithm for creating an optimized candidate sequence is as follows:

-   -   1. Provide a message sequence (an entire message or a portion        thereof). Go to step 2.    -   2. Generate a novel candidate sequence by modifying the codon        usage of a candidate sequence by using, the most promising        candidate sequence previously identified, or by combining        regions of two or more candidates previously identified to        produce a novel hybrid. Go to step 3.    -   3. Evaluate the candidate sequence and determine if it has a        predetermined property. If the candidate has the predetermined        property, then proceed to step 4, otherwise proceed to step 2.    -   4. Use the candidate sequence as an optimized message.

Methods can include first optimizing a mammalian synthetic nucleic acidsequence which encodes a protein of interest or a portion thereof, e.g.,human Factor VIII, human Factor IX, human I-galactosidase, etc. Thesynthetic nucleic acid sequence can be optimized such that 94%, 95%,96%, 97%, 98%, 99%, or all, of the codons of the synthetic DNA arereplaced with common codons. The next step involves determining theamount of protein produced as a result of message optimization comparedto the amount of protein produced using the wild type sequence. Ininstances where the amount of protein produced is not of the desired orexpected level, it may be desirable to replace one or more of the commoncodons of the protein-coding region with a less-common codon ornon-common codon. A mammalian optimized message which is re-engineeredsuch that common codons are replaced with less-common or non-commonmammalian codons, or common codons of other eukaryotic species canresult in at least 1%, 5%, 10%, 20% or more of the common codons beingreplaced. Re-engineering the optimized message can be done, for example,systematically by replacing a single common codon with a less-common ornon-common codon. Alternatively, a block of 2, 4, 6, 10, 20, 40 or morecodons may be replaced with a less-common or non-common codons. Thelevel of protein produced by these “re-engineered optimized” messagesdetermines which re-engineered optimized message is chosen.

Another approach of optimizing a message for increased proteinexpression includes altering the specific nucleotide content of anoptimized synthetic nucleic acid sequence. The synthetic nucleic acidsequence can be altered by increasing or decreasing specificnucleotide(s) content, e.g., G, C, A, T, GC or AT content of thesequence. Increasing or decreasing the specific nucleotide content of asynthetic nucleotide sequence can be done by substituting the nucleotideof interest with another nucleotide. For example, a sequence that has alarge number of codons that have a high GC content, e.g., glycine (GGC),can be substituted with codons that have a less GC rich content, e.g.,glycine (GGT) or an AT rich codon. Similarly, a sequence that has alarge number of codons that have a high AT content, can be substitutedwith codons that have a less AT rich content, e.g., a GC rich codon. Anyregion, or all, of a synthetic nucleic acid sequence can be altered inthis manner, e.g., the 5′UTR (e.g., the promoter-proximal codingregion), the coding region, the intron sequence, or the 3′UTR.Preferably, nucleotide substitutions in the coding region do not resultin an alteration of the amino acid sequence of the expressed product.Preferably, the nucleotide content, e.g., GC or AT content, of asequence is increased or reduced by 10%, 20%, 30%, 40% or more.

The synthetic nucleic acid sequence can encode a mammalian, e.g., ahuman protein. The protein can be, e.g., one which is endogenously ahuman, or an engineered protein. Engineered proteins include proteinswhich differ from the native protein by one or more amino acid residues.Examples of such proteins include fragments, e.g., internal fragments ortruncations, deletions, fusion proteins, and proteins having one or moreamino acid replacements.

A sequence which encodes the protein can have one or more introns. Thesynthetic nucleic acid sequence can include introns, as they are foundin the non-optimized sequence or can include introns from a non-relatedgene. In other embodiments the intronic sequences can be modified. Forexample, all or part of one or more introns present in the gene can beremoved or introns not found in the sequence can be added. In preferredembodiments, one or more entire introns present in the gene are notpresent in the synthetic nucleic acid. In another embodiment, all orpart of an intron present in a gene is replaced by another sequence,e.g., an intronic sequence from another protein.

The synthetic nucleic acid sequence can encode: any protein including ablood factor, e.g., blood clotting factor V, blood clotting factor VII,blood clotting factor VIII, blood clotting factor IX, blood clottingfactor X, or blood clotting factor XIII; an interleukin, e.g.,interleukin 1, interleukin 2, interleukin 3, interleukin 6, interleukin11, or interleukin 12; erythropoietin; calcitonin; growth hormone;insulin; insulinotropin; insulin-like growth factors; parathyroidhormone; Θ-interferon; K-interferon; nerve growth factors; FSHΘ; tumornecrosis factor; glucagon; bone growth factor-2; bone growth factor-7TSH-Θ; CSF-granulocyte; CSF-macrophage; CSF-granulocyte/macrophage;immunoglobulins; catalytic antibodies; protein kinase C;glucocerebrosidase; superoxide dismutase; tissue plasminogen activator;urokinase; antithrombin III; DNAse; I-galactosidase; tyrosinehydroxylase; apolipoprotein E; apolipoprotein A-I; globins; low densitylipoprotein receptor; IL-2 receptor; IL-2 antagonists; alpha-1antitrypsin; immune response modifiers; soluble CD4; a protein expressedunder disease conditions; and proteins encoded by viruses, e.g.,proteins which are encoded by a virus (including a retrovirus) which areexpressed in mammalian cells post-infection.

In preferred embodiments, the synthetic nucleic acid sequence canexpress its protein, e.g., a eukaryotic e.g., mammalian, protein, at alevel which is at least 110%, 150%, 200%, 500%, 1,000%, 5,000% or even10,000% of that expressed by nucleic acid sequence that has not beenoptimized. This comparison can be made, e.g., in an in vitro mammaliancell culture system wherein the non-optimized and optimized sequencesare expressed under the same conditions (e.g., the same cell type, sameculture conditions, same expression vector).

Suitable cell culture systems for measuring expression of the syntheticnucleic acid sequence and corresponding non-optimized nucleic acidsequence are known in the art (e.g., the pBS phagemic vectors,Stratagene, La Jolla, Calif.) and are described in, for example, thestandard molecular biology reference books. Vectors suitable forexpressing the synthetic and non-optimized nucleic acid sequencesencoding the protein of interest are described below and in the standardreference books described below. Expression can be measured using anantibody specific for the protein of interest (e.g., ELISA). Suchantibodies and measurement techniques are known to those skilled in theart.

In a preferred embodiment the protein is a human protein. In morepreferred embodiments, the protein is human Factor VIII and the proteinis a B domain deleted human Factor VIII. In another preferred embodimentthe protein is B domain deleted human Factor VIII with a sequence whichincludes a recognition site for an intracellular protease of thePACE/furin class, such as X-Arg-X-X-Arg site, a short-peptide linker,e.g., a two peptide linker, e.g., a leucine-glutamic acid peptide linker(LE), or a three, or four peptide linker, inserted at the heavy-lightchain junction (see FIG. 1).

A large fraction of the codons in the human messages encoding FactorVIII and Factor IX are non-common codons or less common codons.Replacement of at least 98% of these codons with common codons willyield nucleic acid sequences capable of higher level expression in acell culture. Preferably, all of the codons are replaced with commoncodons and such replacement results in at least a 2 to 5 fold, morepreferably a 10 fold and most preferably a 20 fold increase inexpression when compared to an expression of the corresponding nativesequence in the same expression system.

The synthetic nucleic acid sequences of the invention can be introducedinto the cells of a living organism. The sequences can be introduceddirectly, e.g., via homologous recombination, or via a vector. Forexample, DNA constructs or vectors can be used to introduce a syntheticnucleic acid sequence into cells of a living organism for gene therapy.See, e.g., U.S. Pat. No. 5,460,959; and co-pending U.S. application U.S.Ser. No. 08/334,797; U.S. Ser. No. 08/231,439; U.S. Ser. No. 08/334,455;and U.S. Ser. No. 08/928,881 which are hereby expressly incorporated byreference in their entirety.

Transfected or Infected Cells

Primary and secondary cells to be transfected or infected can beobtained from a variety of tissues and include cell types which can bemaintained and propagated in culture. For example, primary and secondarycells which can be transfected or infected include fibroblasts,keratinocytes, epithelial cells (e.g., mammary epithelial cells,intestinal epithelial cells), endothelial cells, glial cells, neuralcells, a cell comprising a formed element of the blood (e.g.,lymphocytes, bone marrow cells), muscle cells and precursors of thesesomatic cell types. Primary cells are preferably obtained from theindividual to whom the transfected or infected primary or secondarycells are administered. However, primary cells may be obtained from adonor (other than the recipient) of the same species or another species(e.g., mouse, rat, rabbit, cat, dog, pig, cow, bird, sheep, goat,horse).

Primary or secondary cells of vertebrate, particularly mammalian, origincan be transfected or infected with exogenous synthetic DNA encoding atherapeutic protein and produce an encoded therapeutic protein stablyand reproducibly, both in vitro and in vivo, over extended periods oftime. In addition, the transfected or infected primary and secondarycells can express the encoded product in vivo at physiologicallyrelevant levels, cells can be recovered after implantation and, uponreculturing, to grow and display their preimplantation properties.

The transfected or infected primary or secondary cells may also includeDNA encoding a selectable marker which confers a selectable phenotypeupon them, facilitating their identification and isolation. Methods forproducing transfected primary, secondary cells which stably expressexogenous synthetic DNA, clonal cell strains and heterogenous cellstrains of such transfected cells, methods of producing the clonal andheterogenous cell strains, and methods of treating or preventing anabnormal or undesirable condition through the use of populations oftransfected primary or secondary cells are part of the presentinvention. Primary and secondary cells which can be transfected orinfected include fibroblasts, keratinocytes, epithelial cells (e.g.,mammary epithelial cells, intestinal epithelial cells), endothelialcells, glial cells, neural cells, a cell comprising a formed element ofthe blood (e.g., a lymphocyte, a bone marrow cell), muscle cells andprecursors of these somatic cell types. Primary cells are preferablyobtained from the individual to whom the transfected or infected primaryor secondary cells are administered. However, primary cells may beobtained from a donor (other than the recipient) of the same species oranother species (e.g., mouse, rat, rabbit, cat, dog, pig, cow, bird,sheep, goat, horse). Transformed or immortalized cells can also be usede.g., a Bowes Melanoma cell (ATCC Accession No. CRL 9607), a Daudi cell(ATCC Accession No. CCL 213), a HeLa cell and a derivative of a HeLacell (ATCC Accession Nos. CCL 2, CCL2.1, and CCL 2.2), a HL-60 cell(ATCC Accession No. CCL 240), a HT-1080 cell (ATCC Accession No. CCL121), a Jurkat cell (ATCC Accession No. TIB 152), a KB carcinoma cell(ATCC Accession No. CCL 17), a K-562 leukemia cell (ATCC Accession No.CCL 243), a MCF-7 breast cancer cell (ATCC Accession No. BTH 22), aMOLT-4 cell (ATCC Accession No. 1582), a Namalwa cell (ATCC AccessionNo. CRL 1432), a Raji cell (ATCC Accession No. CCL 86), a RPMI 8226 cell(ATCC Accession No. CCL 155), a U-937 cell (ATCC Accession No. CRL1593), WI-38VA13 sub line 2R4 cells (ATCC Accession No. CLL 75.1), aCCRF-CEM cell (ATCC Accession No. CCL 119) and a 2780AD ovariancarcinoma cell (Van Der Blick et al., Cancer Res. 48: 5927-5932, 1988),as well as heterohybridoma cells produced by fusion of human cells andcells of another species. In another embodiment, the immortalized cellline can be a cell line other than a human cell line, e.g., a CHO cellline or a COS cell line. In a preferred embodiment, the cell is anon-transformed cell. In various preferred embodiments, the cell is amammalian cell, e.g., a primary or secondary mammalian cell, e.g., afibroblast, a hematopoietic stem cell, a myoblast, a keratinocyte, anepithelial cell, an endothelial cell, a glial cell, a neural cell, acell comprising a formed element of the blood, a muscle cell andprecursors of these somatic cells. In a most preferred embodiment, thecell is a secondary human fibroblast.

Alternatively, DNA can be delivered into any of the cell types discussedabove by a viral vector infection. Viruses known to be useful for genetransfer include adenoviruses, adeno-associated virus, herpes virus,mumps virus, poliovirus, retroviruses, Sindbis virus, and vaccinia virussuch as canary pox virus. Use of viral vectors is well known in the art:see e.g., Robbins and Ghizzani, Mol. Med. Today 1:410-417, 1995. A cellwhich has an exogenous DNA introduced into it by a viral vector isreferred to as an “infected cell”

The invention also includes the genetic manipulation of a cell whichnormally produces a therapeutic protein. In this instance, the cell ismanipulated such that the endogenous sequence which encodes thetherapeutic protein is replaced with an optimized coding sequence, e.g.,by homologous recombination.

Exogenous Synthetic DNA

Exogenous synthetic DNA incorporated into primary or secondary cells bythe present method can be a synthetic DNA which encodes a protein, or aportion thereof, useful to treat an existing condition or prevent itfrom occurring.

Synthetic DNA incorporated into primary or secondary cells can be anentire gene encoding an entire desired protein or a gene portion whichencodes, for example, the active or functional portion(s) of theprotein. The protein can be, for example, a hormone, a cytokine, anantigen, an antibody, an enzyme, a clotting factor, e.g., Factor VIII orFactor XI, a transport protein, a receptor, a regulatory protein, astructural protein, or a protein which does not occur in nature. The DNAcan be produced, using genetic engineering techniques or syntheticprocesses. The DNA introduced into primary or secondary cells can encodeone or more therapeutic proteins. After introduction into primary orsecondary cells, the exogenous synthetic DNA is stably incorporated intothe recipient cell's genome (along with the additional sequences presentin the DNA construct used), from which it is expressed or otherwisefunctions. Alternatively, the exogenous synthetic DNA may existepisomally within the primary or secondary cells.

Selectable Markers

A variety of selectable markers can be incorporated into primary orsecondary cells. For example, a selectable marker which confers aselectable phenotype such as drug resistance, nutritional auxotrophy,resistance to a cytotoxic agent or expression of a surface protein, canbe used. Selectable marker genes which can be used include neo, gpt,dhfr, ada, pac (puromycin), hyg and hisD. The selectable phenotypeconferred makes it possible to identify and isolate recipient primary orsecondary cells.

DNA Constructs

DNA constructs, which include exogenous synthetic DNA and, optionally,DNA encoding a selectable marker, along with additional sequencesnecessary for expression of the exogenous synthetic DNA in recipientprimary or secondary cells, are used to transfect primary or secondarycells in which the encoded protein is to be produced. Alternatively,infectious vectors, such as retroviral, herpes, lentivirus, adenovirus,adenovirus-associated, mumps and poliovirus vectors, can be used forthis purpose.

A DNA construct which includes the exogenous synthetic DNA andadditional sequences, such as sequences necessary for expression of theexogenous synthetic DNA, can be used. A DNA construct which includes DNAencoding a selectable marker, along with additional sequences, such as apromoter, polyadenylation site and splice junctions, can be used toconfer a selectable phenotype upon introduction into primary orsecondary cells. The two DNA constructs are introduced into primary orsecondary cells, using methods described herein. Alternatively, one DNAconstruct which includes exogenous synthetic DNA, a selectable markergene and additional sequences (e.g., those necessary for expression ofthe exogenous synthetic DNA and for expression of the selectable markergene) can be used.

Transfection of Primary or Secondary Cells and Production of Clonal orHeterogenous Cell Strains

Vertebrate tissue can be obtained by standard methods such as punchbiopsy or other surgical methods of obtaining a tissue source of theprimary cell type of interest. For example, punch biopsy is used toobtain skin as a source of fibroblasts or keratinocytes. A mixture ofprimary cells is obtained from the tissue, using known methods, such asenzymatic digestion. If enzymatic digestion is used, enzymes such ascollagenase, hyaluronidase, dispase, pronase, trypsin, elastase andchymotrypsin can be used.

The resulting primary cell mixture can be transfected directly or it canbe cultured first, removed from the culture plate and resuspended beforetransfection is carried out. Primary cells or secondary cells arecombined with exogenous synthetic DNA to be stably integrated into theirgenomes and, optionally, DNA encoding a selectable marker, and treatedin order to accomplish transfection. The exogenous synthetic DNA andselectable marker-encoding DNA are each on a separate construct or on asingle construct and an appropriate quantity of DNA to ensure that atleast one stably transfected cell containing and appropriatelyexpressing exogenous DNA is produced. In general, 0.1 to 500 ug DNA isused.

Primary or secondary cells can be transfected by electroporation.Electroporation is carried out at appropriate voltage and capacitance(and time constant) to result in entry of the DNA construct(s) into theprimary or secondary cells. Electroporation can be carried out over awide range of voltages (e.g., 50 to 2000 volts) and capacitance values(e.g., 60-300 μFarads). Total DNA of approximately 0.1 to 500 Tg isgenerally used.

Primary or secondary cells can be transfected using microinjection.Alternatively, known methods such as calcium phosphate precipitation,modified calcium phosphate precipitation and polybrene precipitation,liposome fusion and receptor-mediated gene delivery can be used totransfect cells. A stably, transfected cell is isolated and cultured andsubcultivated, under culturing conditions and for sufficient time, topropagate the stably transfected secondary cells and produce a clonalcell strain of transfected secondary cells. Alternatively, more than onetransfected cell is cultured and subcultured, resulting in production ofa heterogenous cell strain.

Transfected primary or secondary cells undergo a sufficient number ofdoublings to produce either a clonal cell strain or a heterogenous cellstrain of sufficient size to provide the therapeutic protein to anindividual in effective amounts. In general, for example, 0.1 cm² ofskin is biopsied and assumed to contain 100,000 cells; one cell is usedto produce a clonal cell strain and undergoes approximately 27 doublingsto produce 100 million transfected secondary cells. If a heterogenouscell strain is to be produced from an original transfected population ofapproximately 100,000 cells, only 10 doublings are needed to produce 100million transfected cells.

The number of required cells in a transfected clonal or heterogenouscell strain is variable and depends on a variety of factors, includingbut not limited to, the use of the transfected cells, the functionallevel of the exogenous DNA in the transfected cells, the site ofimplantation of the transfected cells (for example, the number of cellsthat can be used is limited by the anatomical site of implantation), andthe age, surface area, and clinical condition of the patient. To putthese factors in perspective, to deliver therapeutic levels of humangrowth hormone in an otherwise healthy 10 kg patient with isolatedgrowth hormone deficiency, approximately one to five hundred milliontransfected fibroblasts would be necessary (the volume of these cells isabout that of the very tip of the patient's thumb).

Episomal Expression of Exogenous Synthetic DNA

DNA sequences that are present within the cell yet do not integrate intothe genome are referred to as episomes. Recombinant episomes may beuseful in at least three settings: 1) if a given cell type is incapableof stably integrating the exogenous synthetic DNA; 2) if a given celltype is adversely affected by the integration of synthetic DNA; and 3)if a given cell type is capable of improved therapeutic function with anepisomal rather than integrated synthetic DNA.

Using transfection and culturing as described herein, exogenoussynthetic DNA in the form of episomes can be introduced into vertebrateprimary and secondary cells. Plasmids can be converted into such anepisome by the addition DNA sequences for the Epstein-Barr virus originof replication and nuclear antigen (Yates, J. L. Nature 319:780-7883(1985)). Alternatively, vertebrate autonomously replicating sequencescan be introduced into the construct (Weidle, U. H. Gene 73(2):427-437(1988). These and other episomally derived sequences can also beincluded in DNA constructs without selectable markers, such as pXGH5(Selden et al., Mol Cell Biol. 6:3173-3179, 1986). The episomalsynthetic exogenous DNA is then introduced into primary or secondaryvertebrate cells as described in this application (if a selective markeris included in the episome a selective agent is used to treat thetransfected cells).

Implantation of Clonal Cell Strain or Heterogenous Cell Strain ofTransfected Secondary Cells

The transfected or infected cells produced as described above can beintroduced into an individual to whom the therapeutic protein is to bedelivered, using known methods. The clonal cell strain or heterogenouscell strain is then introduced into an individual, using known methods,using various routes of administration and at various sites (e.g., renalsubcapsular, subcutaneous, central nervous system (includingintrathecal), intravascular, intrahepatic, intrasplanchnic,intraperitoneal (including intraomental, or intramuscular implantation).In a preferred embodiment, the clonal cell strain or heterogeneous cellstrain is introduced into the omentum. The omentum is a membranousstructure containing a sheet of fat. Usually, the omentum is a fold ofperitoneum extending from the stomach to adjacent abdominal organs. Thegreater omentim is attached to the inferior edge of the stomach andhangs down in front of the intestines. The other edge is attached to thetransverse colon. The lesser omentum is attached to the superior edge ofthe stomach and extends to the undersurface of the liver. The cells maybe introduced into any part of the omentum by surgical implantation,laparoscopy or direct injection, e.g., via CT-guided needle orultrasound. Once implanted in the individual, the cells produce thetherapeutic product encoded by the exogenous synthetic DNA or areaffected by the exogenous synthetic DNA itself. For example, anindividual who has been diagnosed with Hemophilia A, a bleeding disorderthat is caused by a deficiency in Factor VIII, a protein normally foundin the blood, is a candidate for a gene therapy treatment. In anotherexample, an individual who has been diagnosed with Hemophilia B, ableeding disorder that is caused by a deficiency in Factor IX, a proteinnormally found in the blood, is a candidate for a gene therapytreatment. The patient has a small skin biopsy performed. This is asimple procedure which can be performed on an out-patient basis. Thepiece of skin, approximately the size of a match head, is taken, forexample, from under the arm and requires about one minute to remove. Thesample is processed, resulting in isolation of the patient's cells andgenetically engineered to produce the missing Factor IX or Factor VIII.Based on the age, weight, and clinical condition of the patient, therequired number of cells are grown in large-scale culture. The entireprocess requires 4-6 weeks and, at the end of that time, the appropriatenumber, e.g., approximately 100-500 million genetically engineered cellsare introduced into the individual, once again as an outpatient (e.g.,by injecting them back under the patient's skin). The patient is nowcapable of producing his or her own Factor IX or Factor VIII and is nolonger a hemophiliac.

A similar approach can be used to treat other conditions or diseases.For example, short stature can be treated by administering human growthhormone to an individual by implanting primary or secondary cells whichexpress human growth hormone; anemia can be treated by administeringerythropoietin (EPO) to an individual by implanting primary or secondarycells which express EPO; or diabetes can be treated by administeringglucogen-like peptide-1 (GLP-1) to an individual by implanting primaryor secondary cells which express GLP-1. A lysosomal storage disease(LSD) can be treated by this approach. LSD's represent a group of atleast 41 distinct genetic diseases, each one representing a deficiencyof a particular protein that is involved in lysosomal biogenesis. Aparticular LSD can be treated by administering a lysosomal enzyme to anindividual by implanting primary or secondary cells which express thelysosomal enzyme, e.g., Fabry Disease can be treated by administeringα-galactosidase to an individual by implanting primary or secondarycells which express α-galactosidase; Gaucher disease can be treated byadministering β-glucoceramidase to an individual by implanting primaryor secondary cells which express β-glucoceramidase; MPS(mucopolysaccharidosis) type 1 (Hurley-Scheie syndrome) can be treatedby administering α-iduronidase to an individual by implanting primary orsecondary cells which express α-iduronidase; MPS type II (Huntersyndrome) can be treated by administering α-L-iduronidase to anindividual by implanting primary or secondary cells which expressα-L-iduronidase; MPS type III-A (Sanfilipo A syndrome) can be treated byadministering glucosamine-N-sulfatase to an individual by implantingprimary or secondary cells which express glucosamine-N-sulfatase; MPStype III-B (Sanfilipo B syndrome) can be treated by administeringalpha-N-acetylglucosaminidase to an individual by implanting primary orsecondary cells which express alpha-N-acetylglucosaminidase; MPS typeIII-C (Sanfilipo C syndrome) can be treated by administeringacetylcoenzyme A:α-glucosmamide-N-acetyltransferase to an individual byimplanting primary or secondary cells which express acetylcoenzymeA:α-glucosmamide-N-acetyltransferase; MPS type 111-D (Sanfilippo Dsyndrome) can be treated by administeringN-acetylglucosamine-6-sulfatase to an individual by implanting primaryor secondary cells which express N-acetylglucosamine-6-sulfatase; MPStype IV-A (Morquip A syndrome) can be treated by administeringN-Acetylglucosamine-6-sulfatase to an individual by implanting primaryor secondary cells which express N-acetylglucosamine-6-sulfatase; MPStype IV-B (Morquio B syndrome) can be treated by administeringβ-galactosidase to an individual by implanting primary or secondarycells which express β-galactosidase; MPS type VI (Maroteaux-Larrysyndrome) can be treated by administeringN-acetylgalactosamine-6-sulfatase to an individual by implanting primaryor secondary cells which express N-acetylgalactosamine-6-sulfatase; MPStype VII (Sly syndrome) can be treated by administering β-glucuronidaseto an individual by implanting primary or secondary cells which expressβ-glucuronidase.

The cells used for implantation will generally be patient-specificgenetically engineered cells. It is possible, however, to obtain cellsfrom another individual of the same species or from a different species.Use of such cells might require administration of an immunosuppressant,alteration of histocompatibility antigens, or use of a barrier device toprevent rejection of the implanted cells. For many diseases, this willbe a one-time treatment and, for others, multiple gene therapytreatments will be required.

Uses of Transfected or Infected Primary and Secondary Cells and CellStrains

Transfected or infected primary or secondary cells or cell strains havewide applicability as a vehicle or delivery system for therapeuticproteins, such as enzymes, hormones, cytokines, antigens, antibodies,clotting factors, anti-sense RNA, regulatory proteins, transcriptionproteins, receptors, structural proteins, novel (non-optimized) proteinsand nucleic acid products, and engineered DNA. For example, transfectedprimary or secondary cells can be used to supply a therapeutic protein,including, but not limited to, Factor VIII, Factor IX, erythropoietin,alpha-1 antitrypsin, calcitonin, glucocerebrosidase, growth hormone, lowdensity lipoprotein (LDL), receptor IL-2 receptor and its antagonists,insulin, globin, immunoglobulins, catalytic antibodies, theinterleukins, insulin-like growth factors, superoxide dismutase, immuneresponder modifiers, parathyroid hormone and interferon, nerve growthfactors, tissue plasminogen activators, and colony stimulating factors.Alternatively, transfected primary and secondary cells can be used toimmunize an individual (i.e., as a vaccine).

The wide variety of uses of cell strains of the present invention canperhaps most conveniently be summarized as shown below. The cell strainscan be used to deliver the following therapeutic products.

1. a secreted protein with predominantly systemic effects;

2. a secreted protein with predominantly local effects;

3. a membrane protein imparting new or enhanced cellular responsiveness;

4. membrane protein facilitating removal of a toxic product;

5. a membrane protein marking or targeting a cell;

6. an intracellular protein;

7. an intracellular protein directly affecting gene expression; and

8. an intracellular protein with autocrine effects.

Transfected or infected primary or secondary cells can be used toadminister therapeutic proteins (e.g., hormones, enzymes, clottingfactors) which are presently administered intravenously, intramuscularlyor subcutaneously, which requires patient cooperation and, often,medical staff participation. When transfected or infected primary orsecondary cells are used, there is no need for extensive purification ofthe polypeptide before it is administered to an individual, as isgenerally necessary with an isolated polypeptide. In addition,transfected or infected primary or secondary cells of the presentinvention produce the therapeutic protein as it would normally beproduced.

An advantage to the use of transfected or infected primary or secondarycells is that by controlling the number of cells introduced into anindividual, one can control the amount of the protein delivered to thebody. In addition, in some cases, it is possible to remove thetransfected or infected cells if there is no longer a need for theproduct. A further advantage of treatment by use of transfected orinfected primary or secondary cells of the present invention is thatproduction of the therapeutic product can be regulated, such as throughthe administration of zinc, steroids or an agent which affectstranscription of a protein, product or nucleic acid product or affectsthe stability of a nucleic acid product.

Transgenic Animals

A number of methods have been used to obtain transgenic, non-humanmammals. A transgenic non-human mammal refers to a mammal that hasgained an additional gene through the introduction of an exogenoussynthetic nucleic acid sequence, i.e., transgene, into its own cells(e.g., both the somatic and germ cells), or into an ancestor's germline.

There are a number of methods to introduce the exogenous DNA into thegerm line (e.g., introduction into the germ or somatic cells) of amammal. One method is by microinjection of a the gene construct into thepronucleus of an early stage embryo (e.g., before the four-cell stage)(Wagner et al., Proc. Natl. Acad. Sci. USA 78:5016 (1981); Brinster etal., Proc Natl Acad Sci USA 82:4438 (1985)). The detailed procedure toproduce such transgenic mice has been described (see e.g., Hogan et al.,Manipulating the Mouse Embryo, Cold Spring Harbour Laboratory, ColdSpring Harbour, NY (1986); U.S. Pat. No. 5,175,383 (1992)). Thisprocedure has also been adapted for other mammalian species (e.g.,Hammer et al., Nature 315:680 (1985); Murray et al., Reprod. Fert. Devl.1:147 (1989); Pursel et al., Vet. Immunol. Histopath. 17:303 (1987);Rexroad et al., J. Reprod. Fert. 41(suppl):119 (1990); Rexroad et al.,Molec. Reprod. Devl. 1:164 (1989); Simons et al., BioTechnology 6:179(1988); Vize et al., J. Cell. Sci. 90:295 (1988); and Wagner, J. Cell.Biochem. 13B(suppl):164 (1989).

Another method for producing germ-line transgenic mammals is through theuse of embryonic stem cells or somatic cells (e.g., embryonic, fetal oradult). The gene construct may be introduced into embryonic stem cellsby homologous recombination (Thomas et al., Cell 51:503 (1987);Capecchi, Science 244:1288 (1989); Joyner et al., Nature 338: 153(1989)). A suitable construct may also be introduced into the embryonicstem cells by DNA-mediated transfection, such as electroporation(Ausubel et al., Current Protocols in Molecular Biology, John Wiley &Sons (1987)). Detailed procedures for culturing embryonic stem cells(e.g. ESD-3, ATCC#CCL-1934, ES-E14TG-2a, ATCC#CCL-1821, American TypeCulture Collection, Rockville, Md.) and the methods of making transgenicmammals from embryonic stem cells can be found in Teratocarcinomas andEmbryonic Stem Cells, A Practical Approach, ed. E. J. Robertson (IRLPress, 1987). Methods of making transgenic animals from somatic cellscan be found, for example, in WO 97/07669, WO 97/07668 and U.S. Pat. No.5,945,577.

In the above methods for the generation of a germ-line transgenicmammals, the construct may be introduced as a linear construct, as acircular plasmid, or as a vector which may be incorporated and inheritedas a transgene integrated into the host genome. The transgene may alsobe constructed so as to permit it to be inherited as an extrachromosomalplasmid (Gassmann, M. et al., Proc. Natl. Acad. Sci. USA 92:1292(1995)).

Human Factor VIII

hFVIII is encoded by a 186 kilobase (kb) gene, with the coding regiondistributed among 26 exons (Gitchier et al., Nature, 312:326-330,(1984)). Transcription of the gene and splicing of the resulting primarytranscript results in an mRNA of approximately 9 kb which encodes aprimary translation product containing 2351 amino acids (aa), includinga 19 aa signal peptide. Excluding the signal peptide, the 2332 aaprotein has a domain structure which can be represented asNH2-A1-A2-B-A3-C1-C2-COOH, with a predicted molecular mass of 265kilodaltons (kD). Glycosylation of this protein results in a productwith a molecular mass of approximately 330 kD as determined by SDS-PAGE.In plasma, hFVIII is a heterodimeric protein consisting of a heavy chainthat ranges in size from 90 kD to 200 kD in a metal ion complex with an80 kD light chain. The heterodimeric complex is further stabilized byinteractions with vWF. The heavy chain is comprised of domains A1-A2-Band the light chain is comprised of domains A3-C1-C2 (FIG. 2). Proteasecleavage sites in the B-domain account for the size variation of theheavy chain, with the 90 kD species containing no B-domain sequences andthe 200 kD species containing a complete or nearly complete B-domain.The B-domain has no known function and it is fully removed upon hFVIIIactivation by thrombin.

Human Factor VIII expression plasmids, plasmids pXF8.186 (FIG. 3),pXF8.61 (FIG. 4), pXF8.38 (FIG. 11) and pXF8.224 (FIG. 13) are describedbelow. The hFVIII expression construct plasmid pXF8.186, was developedbased on detailed optimization studies which resulted in high levelexpression of a functional hFVIII. Given the extremely large size of thehFVIII gene and the need to transfer the entire coding region intocells, cDNA expression plasmids were developed for the production ofstably transfected clonal cell strains. It has proven difficult toachieve high level expression of hFVIII using the wild-type 9 kb cDNA.Three potential reasons for the poor expression are as follows. First,the wild-type cDNA encodes the 909 aa, heavily glycosylated B-domainwhich is transiently attached to the heavy chain and has no knownfunction (FIG. 1). Removal of the region encoding the B-domain fromhFVIII expression constructs leads to greatly improved expression of afunctional protein. Analysis of hFVIII derivatives lacking the B-domainhas demonstrated that hFVIII function is not adversely affected and thatsuch molecules have biochemical, immunologic, and in vivo functionalproperties which are very similar to the wild-type protein. Twodifferent BDD hFVIII expression constructs have been developed, whichencode proteins with different amino acid sequences flanking thedeletion. Plasmid pXF8.186 contains a complete deletion of the B-domain(amino acids 741-1648 of the wild-type mature protein sequence), withthe sequence Arg-Arg-Arg-Arg (RRRR; SEQ ID NO:137) inserted at the heavychain-light chain junction (FIG. 1). This results in a string of fiveconsecutive arginine residues (RRRRR or 5R; SEQ ID NO:138) at the heavychain-light chain junction, which comprises a recognition site for anintracellular protease of the PACE/furin class, and was predicted topromote cleavage to produce the correct heavy and light chains. PlasmidpXF8.61 also contains a complete deletion of the B-domain with asynthetic XhoI site at the junction. This linker results in the presenceof the dipeptide sequence Leu-Glu (LE) at the heavy chain-light chainjunction in the two forms of BDD hFVIII, the expressed proteins arereferred to herein as 5R and LE BDD hFVIII.

The second feature which has been reported to adversely affect hFVIIIexpression in transfected cells relates to the observation that one ormore regions of the coding region have been identified which effectivelyfunction to block transcription of the cDNA sequence. The inventors havenow discovered that the negative influence of the sequence elements canbe reduced or eliminated by altering the entire coding sequence. To thisend, a completely synthetic B-domain deleted hFVIII cDNA was prepared asdescribed in greater detail below. Silent base changes were made in allcodons which did not correspond to the triplet sequence most frequentlyfound for that amino acid in highly expressed human proteins, and suchcodons were converted to the codon sequence most frequently found inhumans for the corresponding amino acid. The resulting coding sequencehas a total of 1094 of 4335 base pairs which differ from the wild-typesequence, yet it encodes a protein with the wild-type hFVIII sequence(with the exception of the deletion of the B-domain). 25.2% of the baseswere changed, and the GC content of the sequence increased from 44% to64%. This sequence-altered BDD hFVIII cDNA is expressed at least5.3-fold more efficiently than a non-altered control construct.

The third feature which was optimized to improve hFVIII expression wasthe intron-exon structure of the expression construct. The cDNA is, bydefinition, devoid of introns. While this reduces the size of theexpression construct, it has been shown that introns can have strongpositive effects on gene expression when added to cDNA expressionconstructs. The 5′ untranslated region of the human beta-actin gene,which contains a complete, functional intron was incorporated into theBDD hFVIII expression constructs pXF8.61 and pXF8.186.

The fourth feature which can adversely affect hFVIII expression is thestability of the Factor VIII mRNA. The stability of the message canaffect the steady-state level of the Factor VIII mRNA, and influencegene expression. Specific sequences within Factor VIII can be altered soas to increase the stability of the mRNA, e.g., the removal of AURE fromthe 3′ UTR can result in a more stable Factor VIII mRNA. The datapresented below show that coding sequence re-engineering has generalutility for the improvement of expression of mammalian and non-mammalianeukaryotic genes in mammalian cells. The results obtained here withhuman Factor VIII suggest that systemic codon optimization (withdisregard to CpG content) provides a fruitful strategy for improving theexpression in mammalian cells of a wide variety of eukaryotic genes.

Methods of Making Synthetic Nucleotide Sequences

A synthetic nucleic acid sequence which directs the synthesis of anoptimized message of the invention can be made, e.g., by any of themethods described herein. The methods described below are advantageousfor making optimized messages for the following reasons:

1) they allow for production of a highly optimized protein, e.g., aprotein having at least 94 to 100% of codons as common codons,especially for proteins larger than 90 amino acids in length. The finalproduct can be 100% optimized, i.e., every single nucleotide is aschosen, without the need to introduce undesirable alterations every100-300 bp. A gene can be synthesized with 100% optimized codons, or itcan be synthesized with 100% the codons that are desired. Additional DNAsequence elements can be introduced or avoided without any limitationsimposed by the need to introduce restriction enzyme sites. Such sequenceelements could include:

Transcriptional signals, such as enhancers or silencers.

Splicing signals, for example avoiding cryptic splice sites in a cDNA,or optimizing the splice site context in an intron-containing gene.Adding an intron to a cDNA may aid expression and allows theintroduction of transcriptional signals within the gene.

Instability signals—the creation or avoidance of sequences that directmRNA breakdown.

Secondary structure—the creation or avoidance of secondary structures inthe mRNA that may affect mRNA stability, transcriptional termination, ortranslation.

Translational signals—Codon choice. A gene can be synthesized with 100%optimal codons, or the codon bias for any amino acid can be alteredwithout restriction to make gene expression sensitive to theconcentration of an amino-acyl-tRNA, whose concentration may vary withgrowth or metabolic conditions.

In each case, the goal may be to increase or decrease expression tobring expression under a particular form of regulation.

2) they improve accuracy of the synthetic sequence because they avoidPCR amplification which introduces errors into the amplified sequence;and

3) they reduce the cost of making the synthetic sequence of theinvention.

The synthetic nucleic acid sequence which directs the synthesis of theoptimized messages of the invention can be prepared, e.g., by using thestrategy which is outlined in greater detail below.

Strategy for Building a Sequence

The initial step is to devise a cloning protocol.

A sequence file containing 100% the desired DNA sequence is generated.This sequence is analyzed for restriction sites, including fusion sites.

Fusion sites are, in order of preference:

A) Sequences resulting from the ligation of two complementary overhangsnormally generated by available restriction enzymes, e.g.,

SalI/XhoI = G{circumflex over ( )}TCGAG CAGCT{circumflex over ( )}C orBspDI/BstBI = AT{circumflex over ( )}CGAA TAGC{circumflex over ( )}TT orBstBI/AccI = TT{circumflex over ( )}CGAC AAGC{circumflex over ( )}TG.B) Sequences resulting from the ligation of two overhangs generated bypartially filling-in the overhangs of available restriction enzymes,e.g.,

XhoI(+TC)/BamHI(+GA) = CTC{circumflex over ( )}GATCC. GAGCT{circumflexover ( )}AGGC) Sequences resulting from the blunt ligation of two blunt endsnormally generated by available restriction enzymes, e.g.,

EheI/SmaI = GGC{circumflex over ( )}GGG CCG{circumflex over ( )}CCC.D) Sequences resulting from the blunt ligation of two blunt ends, whereone or both blunt ends have been generated by filling in an overhang,e.g.,

BamHI(+GATC)/SmaI = GGATC{circumflex over ( )}GGG CCTAG{circumflex over( )}CCC

The filling-in of a 5′ overhang generated by a restriction enzyme isperformed using a DNA polymerase, for example the Klenow fragment of DNAPolymerase I. If the overhang is to be filled in completely, then allfour nucleotides, dATP, dCTP, dGTP, and dTTP, are included in thereaction. If the overhang is to be only partially filled in, then therequisite nucleotides are omitted from the reaction. In item (B) above,the XhoI-digested DNA would be filled in by Klenow in the presence ofdCTP and dTTP and by omitting dATP and dGTP. An order of cloning stepsis determined that allows the use of sites about 150-500 bp apart. Notethat a fragment must lack the recognition sequence for an enzyme, onlyif that enzyme is used to clone the fragment. For example, the strategyfor the construction of the “desired” Factor VIII coding sequence canuse ApaLI in a number of different places, because of the order ofassembly of the fragments—ApaLI is not used in any of the later cloningsteps.

If there is a region where no useful sites are available, then asequence-independent strategy can be used: fragments are cloned into aDNA construct that contain recognition sequences for restriction enzymesthat cleave outside of their recognition sequence, e.g.,

BseRI = (SEQ ID NO: 5) GAGGAGNNNNNNNNNN{circumflex over ( )}(SEQ ID NO: 6) CTCCTCNNNNNNNN{circumflex over ( )}NN

DNA construct cloning site gene fragment

The recognition sequence of the enzyme used to clone the fragment willbe removed when the fragment is released by digestion with, e.g. BseRI,leaving a fragment consisting of 100% of the desired sequence, which canthen be ligated to a similarly generated adjacent gene fragment.

The next step is to synthesize initial restriction fragments.

The synthesis of the initial restriction fragments can be achieved in anumber of ways, including, but not limited to:

1. Chemical synthesis of the entire fragment.

2. Synthesize two oligonucleotides that are complementary at their 3′ends, anneal them, and use DNA polymerase Klenow fragment, orequivalent, to extend, giving a double-stranded fragment.

3. Synthesize a number of smaller oligonucleotides, kinase those oligosthat have internal 5′ ends, anneal all oligos and ligate, viz.

5′___p______p_______3′ 3′_____p_______p____5′

Techniques 2 and 3 can be used in subsequent steps to join smallerfragments to each other. PCR can be used to increase the quantity ofmaterial for cloning, but it may lead to an increase in the number ofmutations. If an error-free fragment is not obtained, then site-directedmutagenesis can be used to correct the best isolate. This is followed byconcatenation of error-free fragments and sequencing of junctions toconfirm their precision.

Use

The synthetic nucleic acid sequences of the invention are useful forexpressing a protein normally expressed in a mammalian cell, or in cellculture (e.g. for commercial production of human proteins such as GH,tPA, GLP-1, EPO, α-galactosidase, β-glucoceramidase, α-iduronidase;α-L-iduronidase, glucosamine-N-sulfatase, alpha-N-acetylglucosaminidase,acetylcoenzyme A:α-glucosmamide-N-acetyltransferase,N-acetylglucosamine-6-sulfatase, N-acetylglucosamine-6-sulfatase,β-galactosidase, N-acetylgalactosamine-6-sulfatase, β-glucuronidase.Factor VIII, and Factor IX). The synthetic nucleic acid sequences of theinvention are also useful for gene therapy. For example, a syntheticnucleic acid sequence encoding a selected protein can be introduceddirectly, e.g., via non-viral cell transfection or via a vector in to acell, e.g., a transformed or a non-transformed cell, which can expressthe protein to create a cell which can be administered to a patient inneed of the protein. Such cell-based gene therapy techniques aredescribed in greater detail in co-pending US applications: U.S. Ser. No.08/334,797; U.S. Ser. No. 08/231,439; U.S. Ser. No. 08/334,455; and U.S.Ser. No. 08/928,881, which are hereby expressly incorporated byreference in their entirety.

EXAMPLES I. Factor VIII Constructs and Uses Thereof

Construction of pXF8.61

The fourteen gene fragments of the B-domain-deleted-FVIII optimized cDNAlisted in Table 2 and shown in FIG. 5 (Fragment A-Fragment N) were madeas follows. 92 oligonucleotides were made by oligonucleotide synthesison an ABI 391 synthesizer (Perkin Elmer). The 92 oligonucleotides arelisted in Table 3. FIG. 5 shows how these 92 oligonucleotides anneal toform the fourteen gene fragments of Table 2. For each strand of eachgene fragment, the first oligonucleotide (i.e. the most 5′) wasmanufactured with a 5′-hydroxyl terminus, and the subsequentoligonucleotides were manufactured as 5′-phosphorylated to allow theligation of adjacent annealed oligonucleotides. For gene fragments A, B,C, F, G, J, K, L, M and N, six oligonucleotides were annealed, ligated,digested with EcoRI and HindIII and cloned into pUC18 digested withEcoRI and HindIII. For gene fragments D, E, H and I, eightoligonucleotides were annealed, ligated, digested with EcoRI and HindIIIand cloned into pUC18 digested with EcoRI and HindIII. This proceduregenerated fourteen different plasmids—pAM1A through pAM1N.

TABLE 2 Fragment 5′ end 3′ end Note A NheI 1 ApaI 279 B ApaI 279 Pm1I544 C Pm1I 544 Pm1I 829 D Pm1I 829 Bg1II(/BamHI) 1172 BamHI site 3′ toseq E (Bg1II/)Bam 1172 Bg1II 1583 HI F Bg1II 1583 KpnI 1817 G KpnI 1817BamHI 2126 H BamHI 2126 Pm1I 2491 I Pm1I 2491 KpnI 3170 -BstEII2661-2955 J BstEII 2661 BstEII 2955 K KpnI 3170 ApaI 3482 L ApaI 3482SmaI(/EcoRV) 3772 M (SmaI/)EcoRV 3772 BstEII 4062 N BstEII 4062 SmaI4348In Table 2 the restriction site positions are numbered by the first baseof the palindrome; numbering begins at the NheI site.

TABLE 3 Oligo′ Oligo′ Name Length Oligonucleotide Sequence AM1Af1 118GTAGAATTCGTAGGCTAGCATGCAGATCGAGCTGAGCACCTGCTTCTTCCTGTGCCTGCTGCGCTTCTGCTTCAGCGCCACCCGCCGCTACTACCTGGGCGC CGTGGAGCTGAGCTGG(SEQ ID NO: 7) AM1Af2 104GACTACATGCAGAGCGACCTGGGCGAGCTGCCCGTGGACGCCCGCTTCCCCCCCCGCGTGCCCAAGAGCTTCCCCTTCAACACCAGCGTGGTGTACAAGAA GAC (SEQ ID NO: 8)AM1Af3 88 CCTGTTCGTGGAGTTCACCGACCACCTGTTCAACATCGCCAAGCCCCGCCCCCCCTGGATGGGCCTGCTGGGCCCCTACAAGCTTTAC (SEQ ID NO: 9) AM1Ar1 119GTAAAGCTTGTAGGGGCCCAGCAGGCCCATCCAGGGGGGGCGGGGCTTGGCGATGTTGAACAGGTGGTCGGTGAACTCCACGAACAGGGTCTTCTTGTAC ACCACGCTGGTGTTGAAGG(SEQ ID NO: 10) AM1Ar2 107GGAAGCTCTTGGGCACGCGGGGGGGGAAGCGGGCGTCCACGGGCAGCTCGCCCAGGTCGCTCTGCATGTAGTCCCAGCTCAGCTCCACGGCGCCCAGGTA GTAGCGG(SEQ ID NO: 11) AM1Ar3 84CGGGTGGCGCTGAAGCAGAAGCGCAGCAGGCACAGGAAGAAGCAGGTGCTCAGCTCGATCTGCATGCTAGCCTACGAATTCTAC (SEQ ID NO: 12) AM1Bf1 115GTAGAATTCGTAGGGGCCCCACCATCCAGGCCGAGGTGTACGACACCGTGGTGATCACCCTGAAGAACATGGCCAGCCACCCCGTGAGCCTGCACGCCGT GGGCGTGAGCTACTG(SEQ ID NO: 13) AM1Bf2 103GAAGGCCAGCGAGGGCGCCGAGTACGACGACCAGACCAGCCAGCGCGAGAAGGAGGACGACAAGGTGTTCCCCGGCGGCAGCCACACCTACGTGTGGC AGGTG (SEQ ID NO: 14)AM1Bf3 79 CTGAAGGAGAACGGCCCCATGGCCAGCGACCCCCTGTGCCTGACCTACAGCTACCTGAGCCACGTGCTACAAGCTTTAC (SEQ ID NO: 15) AM1Br1 107GTAAAGCTTGTAGCACGTGGCTCAGGTAGCTGTAGGTCAGGCACAGGGGGTCGCTGGCCATGGGGCCGTTCTCCTTCAGCACCTGCCACACGTAGGTGTG GCTGCCG(SEQ ID NO: 16) AM1Br2 101CCGGGGAACACCTTGTCGTCCTCCTTCTCGCGCTGGCTGGTCTGGTCGTCGTACTCGGCGCCCTCGCTGGCCTTCCAGTAGCTCACGCCCACGGCGTGCAG (SEQ ID NO: 17)AM1Br3 89 GCTCACGGGGTGGCTGGCCATGTTCTTCAGGGTGATCACCACGGTGTCGTACACCTCGGCCTGGATGGTGGGGCCCCTACGAATTCTAC (SEQ ID NO: 18) AM1Cf1 122GTAGAATTCGTAGCCACGTGGACCTGGTGAAGGACCTGAACAGCGGCCTGATCGGCGCCCTGCTGGTGTGCCGCGAGGGCAGCCTGGCCAAGGAGAAGACCCAGACCCTGCACAAGTTCATC (SEQ ID NO: 19) AM1Cf2 110CTGCTGTTCGCCGTGTTCGACGAGGGCAAGAGCTGGCACAGCGAGACCAAGAACAGCCTGATGCAGGACCGCGACGCCGCCAGCGCCCGCGCCTGGCCC AAGATGCACAC(SEQ ID NO: 20) AM1Cf3 86CGTGAACGGCTACGTGAACCGCAGCCTGCCCGGCCTGATCGGCTGCCACCGCAAGAGCGTGTACTGGCACGTGCTACAAGCTTTAC (SEQ ID NO: 21) AM1Cr1 108GTAAAGCTTGTAGCACGTGCCAGTACACGCTCTTGCGGTGGCAGCCGATCAGGCCGGGCAGGCTGCGGTTCACGTAGCCGTTCACGGTGTGCATCTTGGGC CAGGCGC(SEQ ID NO: 22) AM1Cr2 110GGGCGCTGGCGGCGTCGCGGTCCTGCATCAGGCTGTTCTTGGTCTCGCTGTGCCAGCTCTTGCCCTCGTCGAACACGGCGAACAGCAGGATGAACTTGTGC AGGGTCTGG(SEQ ID NO: 23) AM1Cr3 100GTCTTCTCCTTGGCCAGGCTGCCCTCGCGGCACACCAGCAGGGCGCCGATCAGGCCGCTGTTCAGGTCCTTCACCAGGTCCACGTGGCTACGAATTCTAC (SEQ ID NO: 24)AM1Df1 99 GTAGAATTCGTAGCACGTGATCGGCATGGGCACCACCCCCGAGGTGCACAGCATCTTCCTGGAGGGCCACACCTTCCTGGTGCGCAACCACCGCCAGGC (SEQ ID NO: 25)AM1Df2 100 CAGCCTGGAGATCAGCCCCATCACCTTCCTGACCGCCCAGACCCTGCTGATGGACCTGGGCCAGTTCCTGCTGTTCTGCCACATCAGCAGCCACCAGCAC (SEQ ID NO: 26)AM1Df3 101 GACGGCATGGAGGCCTACGTGAAGGTGGACAGCTGCCCCGAGGAGCCCCAGCTGCGCATGAAGAACAACGAGGAGGCCGAGGACTACGACGACGACCTG AC (SEQ ID NO: 27)AM1Df4 84 CGACAGCGAGATGGACGTGGTGCGCTTCGACGACGACAACAGCCCCAGCTTCATCCAGATCTCTACGGATCCTACAAGCTTTAC (SEQ ID NO: 28) AM1Dr1 109GTAAAGCTTGTAGGATCCGTAGAGATCTGGATGAAGCTGGGGCTGTTGTCGTCGTCGAAGCGCACCACGTCCATCTCGCTGTCGGTCAGGTCGTCGTCGTA GTCCTCGG(SEQ ID NO: 29) AM1Dr2 101CCTCCTCGTTGTTCTTCATGCGCAGCTGGGGCTCCTCGGGGCAGCTGTCCACCTTCACGTAGGCCTCCATGCCGTCGTGCTGGTGGCTGCTGATGTGGCAG (SEQ ID NO: 30)AM1Dr3 102 AACAGCAGGAACTGGCCCAGGTCCATCAGCAGGGTCTGGGCGGTCAGGAAGGTGATGGGGCTGATCTCCAGGCTGGCCTGGCGGTGGTTGCGCACCAGG AAG (SEQ ID NO: 31)AM1Dr4 72 GTGTGGCCCTCCAGGAAGATGCTGTGCACCTCGGGGGTGGTGCCCATGCCGATCACGTGCTACGAATTCTAC (SEQ ID NO: 32) AM1Ef1 122 GTAGAATTCGTAGGGATCCGCAGCGTGGCCAAGAAGCACCCCAAGACCTGGGTGCACTACATCGCCGCCGAGGAGGAGGACTGGGACTACGCCCCCCTGGTGCTGGCCCCCGACGACCGCAG (SEQ ID NO: 33) AM1Ef2 120CTACAAGAGCCAGTACCTGAACAACGGCCCCCAGCGCATCGGCCGCAAGTACAAGAAGGTGCGCTTCATGGCCTACACCGACGAGACCTTCAAGACCCGC GAGGCCATCCAGCACGAGAG(SEQ ID NO: 34) AM1Ef3 115CGGCATCCTGGGCCCCCTGCTGTACGGCGAGGTGGGCGACACCCTGCTGATCATCTTCAAGAACCAGGCCAGCCGCCCCTACAACATCTACCCCCACGGCA TCACCGACGTGCGC(SEQ ID NO: 35) AM1Ef4 86CCCCTGTACAGCCGCCGCCTGCCCAAGGGCGTGAAGCACCTGAAGGACTTCCCCATCCTGCCCGGCGAGATCTCTACAAGCTTTAC (SEQ ID NO: 36) AM1Er1 109GTAAAGCTTGTAGAGATCTCGCCGGGCAGGATGGGGAAGTCCTTCAGGTGCTTCACGCCCTTGGGCAGGCGGCGGCTGTACAGGGGGCGCACGTCGGTG ATGCCGTGGG(SEQ ID NO: 37) AM1Er2 114GGTAGATGTTGTAGGGGCGGCTGGCCTGGTTCTTGAAGATGATCAGCAGGGTGTCGCCCACCTCGCCGTACAGCAGGGGGCCCAGGATGCCGCTCTCGTGC TGGATGGCCTCGC(SEQ ID NO: 38) AM1Er3 121GGGTCTTGAAGGTCTCGTCGGTGTAGGCCATGAAGCGCACCTTCTTGTACTTGCGGCCGATGCGCTGGGGGCCGTTGTTCAGGTACTGGCTCTTGTAGCTG CGGTCGTCGGGGGCCAGCAC(SEQ ID NO: 39) AM1Er4 99CAGGGGGGCGTAGTCCCAGTCCTCCTCCTCGGCGGCGATGTAGTGCACCCAGGTCTTGGGGTGCTTCTTGGCCACGCTGCGGATCCCTACGAATTCTAC (SEQ ID NO: 40) AM1Ff1102 GTAGAATTCGTAGAGATCTTCAAGTACAAGTGGACCGTGACCGTGGAGGACGGCCCCACCAAGAGCGACCCCCGCTGCCTGACCCGCTACTACAGCAGCT TC (SEQ ID NO: 41)AM1Ff2 103 GTGAACATGGAGCGCGACCTGGCCAGCGGCCTGATCGGCCCCCTGCTGATCTGCTACAAGGAGAGCGTGGACCAGCGCGGCAACCAGATCATGAGCGACA AGC (SEQ ID NO: 42)AM1Ff3 61 GCAACGTGATCCTGTTCAGCGTGTTCGACGAGAACCGCAGCTGGTACCCT ACAAGCTTTAC(SEQ ID NO: 43) AM1Fr1 87GTAAAGCTTGTAGGGTACCAGCTGCGGTTCTCGTCGAACACGCTGAACAGGATCACGTTGCGCTTGTCGCTCATGATCTGGTTGCCG (SEQ ID NO: 44) AM1Fr2 101CGCTGGTCCACGCTCTCCTTGTAGCAGATCAGCAGGGGGCCGATCAGGCCGCTGGCCAGGTCGCGCTCCATGTTCACGAAGCTGCTGTAGTAGCGGGTCAG (SEQ ID NO: 45)AM1Fr3 78 GCAGCGGGGGTCGCTCTTGGTGGGGCCGTCCTCCACGGTCACGGTCCACTTGTACTTGAAGATCTCTACGAATTCTAC (SEQ ID NO: 46) AM1Gf1 120GTAGAATTCGTAGGGTACCTGACCGAGAACATCCAGCGCTTCCTGCCCAACCCCGCCGGCGTGCAGCTGGAGGACCCCGAGTTCCAGGCCAGCAACATCA TGCACAGCATCAACGGCTAC(SEQ ID NO: 47) AM1Gf2 126GTGTTCGACAGCCTGCAGCTGAGCGTGTGCCTGCACGAGGTGGCCTACTGGTACATCCTGAGCATCGGCGCCCAGACCGACTTCCTGAGCGTGTTCTTCAGCGGCTACACCTTCAAGCACAAGATG (SEQ ID NO: 48) AM1Gf3 95GTGTACGAGGACACCCTGACCCTGTTCCCCTTCAGCGGCGAGACCGTGTTCATGAGCATGGAGAACCCCGGCCTGTGGATCCCTACAAGCTTTAC (SEQ ID NO: 49) AM1Gr1 119GTAAAGCTTGTAGGGATCCACAGGCCGGGGTTCTCCATGCTCATGAACACGGTCTCGCCGCTGAAGGGGAACAGGGTCAGGGTGTCCTCGTACACCATCTT GTGCTTGAAGGTGTAGCC(SEQ ID NO: 50) AM1Gr2 124GCTGAAGAACACGCTCAGGAAGTCGGTCTGGGCGCCGATGCTCAGGATGTACCAGTAGGCCACCTCGTGCAGGCACACGCTCAGCTGCAGGCTGTCGAACACGTAGCCGTTGATGCTGTGCATG (SEQ ID NO: 51) AM1Gr3 98ATGTTGCTGGCCTGGAACTCGGGGTCCTCCAGCTGCACGCCGGCGGGGTTGGGCAGGAAGCGCTGGATGTTCTCGGTCAGGTACCCTACGAATTCTAC (SEQ ID NO: 52) AM1Hf1111 GTAGAATTCGTAGGGATCCTGGGCTGCCACAACAGCGACTTCCGCAACCGCGGCATGACCGCCCTGCTGAAGGTGAGCAGCTGCGACAAGAACACCGGCG ACTACTACGAG(SEQ ID NO: 53) AM1Hf2 102GACAGCTACGAGGACATCAGCGCCTACCTGCTGAGCAAGAACAACGCCATCGAGCCCCGCCTGGAGGAGATCACCCGCACCACCCTGCAGAGCGACCAG GAG (SEQ ID NO: 54)AM1Hf3 105 GAGATCGACTACGACGACACCATCAGCGTGGAGATGAAGAAGGAGGACTTCGACATCTACGACGAGGACGAGAACCAGAGCCCCCGCAGCTTCCAGAAG AAGACC(SEQ ID NO: 55) AM1Hf4 79CGCCACTACTTCATCGCCGCCGTGGAGCGCCTGTGGGACTACGGCATGAGCAGCAGCCCCCACGTGCTACAAGCTTTAC (SEQ ID NO: 56) AM1Hr1 101GTAAAGCTTGTAGCACGTGGGGGCTGCTGCTCATGCCGTAGTCCCACAGGCGCTCCACGGCGGCGATGAAGTAGTGGCGGGTCTTCTTCTGGAAGCTGCGG (SEQ ID NO: 57)AM1Hr2 105 GGGCTCTGGTTCTCGTCCTCGTCGTAGATGTCGAAGTCCTCCTTCTTCATCTCCACGCTGATGGTGTCGTCGTAGTCGATCTCCTCCTGGTCGCTCTGCAGGG TG (SEQ ID NO: 58)AM1Hr3 108 GTGCGGGTGATCTCCTCCAGGCGGGGCTCGATGGCGTTGTTCTTGCTCAGCAGGTAGGCGCTGATGTCCTCGTAGCTGTCCTCGTAGTAGTCGCCGGTGTT CTTGTCG(SEQ ID NO: 59) AM1Hr4 83CAGCTGCTCACCTTCAGCAGGGCGGTCATGCCGCGGTTGCGGAAGTCGCTGTTGTGGCAGCCCAGGATCCCTACGAATTCTAC (SEQ ID NO: 60) AM1If1 115GTAGAATTCGTAGCACGTGCTGCGCAACCGCGCCCAGAGCGGCAGCGTGCCCCAGTTCAAGAAGGTGGTGTTCCAGGAGTTCACCGACGGCAGCTTCACC CAGCCCCTGTACCGC(SEQ ID NO: 61) AM1If2 111GGCGAGCTGAACGAGCACCTGGGCCTGCTGGGCCCCTACATCCGCGCCGAGGTGGAGGACAACATCATGGTGACCGTGCAGGAGTTCGCCCTGTTCTTCA CCATCTTCGAC(SEQ ID NO: 62) AM1If3 106GAGACCAAGAGCTGGTACTTCACCGAGAACATGGAGCGCAACTGCCGCGCCCCCTGCAACATCCAGATGGAGGACCCCACCTTCAAGGAGAACTACCGCT TCCACG(SEQ ID NO: 63) AM1If4 85CCATCAACGGCTACATCATGGACACCCTGCCCGGCCTGGTGATGGCCCAGGACCAGCGCATCCGCTGGTACCCTACAAGCTTTAC (SEQ ID NO: 64) AM1Ir1 115GTAAAGCTTGTAGGGTACCAGCGGATGCGCTGGTCCTGGGCCATCACCAGGCCGGGCAGGGTGTCCATGATGTAGCCGTTGATGGCGTGGAAGCGGTAGTT CTCCTTGAAGGTGG(SEQ ID NO: 65) AM1Ir2 99GGTCCTCCATCTGGATGTTGCAGGGGGCGCGGCAGTTGCGCTCCATGTTCTCGGTGAAGTACCAGCTCTTGGTCTCGTCGAAGATGGTGAAGAACAGGG (SEQ ID NO: 66) AM1Ir3110 CGAACTCCTGCACGGTCACCATGATGTTGTCCTCCACCTCGGCGCGGATGTAGGGGCCCAGCAGGCCCAGGTGCTCGTTCAGCTCGCCGCGGTACAGGGG CTGGGTGAAG(SEQ ID NO: 67) AM1Ir4 93CTGCCGTCGGTGAACTCCTGGAACACCACCTTCTTGAACTGGGGCACGCTGCCGCTCTGGGCGCGGTTGCGCAGCACGTGCTACGAATTCTAC (SEQ ID NO: 68) AM1Jf1 116GTAGAATTCGTAGGGTGACCTTCCGCAACCAGGCCAGCCGCCCCTACAGCTTCTACAGCAGCCTGATCAGCTACGAGGAGGACCAGCGCCAGGGCGCCGAGCCCCGCAAGAACTTC (SEQ ID NO: 69)AM1Jf2 120 GTGAAGCCCAACGAGACCAAGACCTACTTCTGGAAGGTGCAGCACCACATGGCCCCCACCAAGGACGAGTTCGACTGCAAGGCCTGGGCCTACTTCAGCG ACGTGGACCTGGAGAAGGAC(SEQ ID NO: 70) AM1Jf3 91GTGCACAGCGGCCTGATCGGCCCCCTGCTGGTGTGCCACACCAACACCCTGAACCCCGCCCACGGCCGCCAGGTGACCCTACAAGCTTTAC (SEQ ID NO: 71) AM1Jr1 113GTAAAGCTTGTAGGGTCACCTGGCGGCCGTGGGCGGGGTTCAGGGTGTTGGTGTGGCACACCAGCAGGGGGCCGATCAGGCCGCTGTGCACGTCCTTCTCC AGGTCCACGTCG(SEQ ID NO: 72) AM1Jr2 121CTGAAGTAGGCCCAGGCCTTGCAGTCGAACTCGTCCTTGGTGGGGGCCATGTGGTGCTGCACCTTCCAGAAGTAGGTCTTGGTCTCGTTGGGCTTCACGAA GTTCTTGCGGGGCTCGGCGC(SEQ ID NO: 73) AM1Jr3 93CCTGGCGCTGGTCCTCCTCGTAGCTGATCAGGCTGCTGTAGAAGCTGTAGGGGCGGCTGGCCTGGTTGCGGAAGGTCACCCTACGAATTCTAC (SEQ ID NO: 74) AM1Kf1 120GTAGAATTCGTAGGGTACCTGCTGAGCATGGGCAGCAACGAGAACATCCACAGCATCCACTTCAGCGGCCACGTGTTCACCGTGCGCAAGAAGGAGGAG TACAAGATGGCCCTGTACAAC(SEQ ID NO: 75) AM1Kf2 122CTGTACCCCGGCGTGTTCGAGACCGTGGAGATGCTGCCCAGCAAGGCCGGCATCTGGCGCGTGGAGTGCCTGATCGGCGAGCACCTGCACGCCGGCATGAGCACCCTGTTCCTGGTGTACAG (SEQ ID NO: 76) AM1Kf3 102CAACAAGTGCCAGACCCCCCTGGGCATGGCCAGCGGCCACATCCGCGACTTCCAGATCACCGCCAGCGGCCAGTACGGCCAGTGGGCCCCTACAAGCTTT AC (SEQ ID NO: 77)AM1Kr1 123 GTAAAGCTTGTAGGGGCCCACTGGCCGTACTGGCCGCTGGCGGTGATCTGGAAGTCGCGGATGTGGCCGCTGGCCATGCCCAGGGGGGTCTGGCACTTGTTGCTGTACACCAGGAACAGGGTG (SEQ ID NO: 78) AM1Kr2 125CTCATGCCGGCGTGCAGGTGCTCGCCGATCAGGCACTCCACGCGCCAGATGCCGGCCTTGCTGGGCAGCATCTCCACGGTCTCGAACACGCCGGGGTACAGGTTGTACAGGGCCATCTTGTACTC (SEQ ID NO: 79) AM1Kr3 96CTCCTTCTTGCGCACGGTGAACACGTGGCCGCTGAAGTGGATGCTGTGGATGTTCTCGTTGCTGCCCATGCTCAGCAGGTACCCTACGAATTCTAC (SEQ ID NO: 80) AM1Lf1120 GTAGAATTCGTAGGGGCCCCCAAGCTGGCCCGCCTGCACTACAGCGGCAGCATCAACGCCTGGAGCACCAAGGAGCCCTTCAGCTGGATCAAGGTGGAC CTGCTGGCCCCCATGATCATC(SEQ ID NO: 81) AM1Lf2 116CACGGCATCAAGACCCAGGGCGCCCGCCAGAAGTTCAGCAGCCTGTACATCAGCCAGTTCATCATCATGTACAGCCTGGACGGCAAGAAGTGGCAGACCT ACCGCGGCAACAGCAC(SEQ ID NO: 82) AM1Lf3 86CGGCACCCTGATGGTGTTCTTCGGCAACGTGGACAGCAGCGGCATCAAGCACAACATCTTCAACCCCCCCGGGCTACAAGCTTTAC (SEQ ID NO: 83) AM1Lr1 110GTAAAGCTTGTAGCCCGGGGGGGTTGAAGATGTTGTGCTTGATGCCGCTGCTGTCCACGTTGCCGAAGAACACCATCAGGGTGCCGGTGCTGTTGCCGCGG TAGGTCTGC(SEQ ID NO: 84) AM1Lr2 113CACTTCTTGCCGTCCAGGCTGTACATGATGATGAACTGGCTGATGTACAGGCTGCTGAACTTCTGGCGGGCGCCCTGGGTCTTGATGCCGTGGATGATCAT GGGGGCCAGCAG(SEQ ID NO: 85) AM1Lr3 99GTCCACCTTGATCCAGCTGAAGGGCTCCTTGGTGCTCCAGGCGTTGATGCTGCCGCTGTAGTGCAGGCGGGCCAGCTTGGGGGCCCCTACGAATTCTAC (SEQ ID NO: 86) AM1Mf1122 GTAGAATTCGTAGGATATCATCGCCCGCTACATCCGCCTGCACCCCACCCACTACAGCATCCGCAGCACCCTGCGCATGGAGCTGATGGGCTGCGACCTGAACAGCTGCAGCATGCCCCTGG (SEQ ID NO: 87) AM1Mf2 112GCATGGAGAGCAAGGCCATCAGCGACGCCCAGATCACCGCCAGCAGCTACTTCACCAACATGTTCGCCACCTGGAGCCCCAGCAAGGCCCGCCTGCACCT GCAGGGCCGCAG(SEQ ID NO: 88) AM1Mf3 89CAACGCCTGGCGCCCCCAGGTGAACAACCCCAAGGAGTGGCTGCAGGTGGACTTCCAGAAGACCATGAAGGTGACCCTACAAGCTTTAC (SEQ ID NO: 89) AM1Mr1 112GTAAAGCTTGTAGGGTCACCTTCATGGTCTTCTGGAAGTCCACCTGCAGCCACTCCTTGGGGTTGTTCACCTGGGGGCGCCAGGCGTTGCTGCGGCCCTGC AGGTGCAGGCG(SEQ ID NO: 90) AM1Mr2 114GGCCTTGCTGGGGCTCCAGGTGGCGAACATGTTGGTGAAGTAGCTGCTGGCGGTGATCTGGGCGTCGCTGATGGCCTTGCTCTCCATGCCCAGGGGCATGC TGCAGCTGTTCAG(SEQ ID NO: 91) AM1Mr3 97GTCGCAGCCCATCAGCTCCATGCGCAGGGTGCTGCGGATGCTGTAGTGGGTGGGGTGCAGGCGGATGTAGCGGGCGATGATATCCTACGAATTCTAC (SEQ ID NO: 92) AM1Nf1122 GTAGAATTCGTAGGGTGACCGGCGTGACCACCCAGGGCGTGAAGAGCCTGCTGACCAGCATGTACGTGAAGGAGTTCCTGATCAGCAGCAGCCAGGACGGCCACCAGTGGACCCTGTTCTTC (SEQ ID NO: 93) AM1Nf2 104CAGAACGGCAAGGTGAAGGTGTTCCAGGGCAACCAGGACAGCTTCACCCCCGTGGTGAACAGCCTGGACCCCCCCCTGCTGACCCGCTACCTGCGCATCC ACCC (SEQ ID NO: 94)AM1Nf3 92 CCAGAGCTGGGTGCACCAGATCGCCCTGCGCATGGAGGTGCTGGGCTGCGAGGCCCAGGACCTGTACTAGCTGCCCGGGCTACAAGCTTTAC (SEQ ID NO: 95) AM1Nr1 118GTAAAGCTTGTAGCCCGGGCAGCTAGTACAGGTCCTGGGCCTCGCAGCCCAGCACCTCCATGCGCAGGGCGATCTGGTGCACCCAGCTCTGGGGGTGGATG CGCAGGTAGCGGGTCAG(SEQ ID NO: 96) AM1Nr2 100CAGGGGGGGGTCCAGGCTGTTCACCACGGGGGTGAAGCTGTCCTGGTTGCCCTGGAACACCTTCACCTTGCCGTTCTGGAAGAACAGGGTCCACTGGTGG (SEQ ID NO: 97)AM1Nr3 100 CCGTCCTGGCTGCTGCTGATCAGGAACTCCTTCACGTACATGCTGGTCAGCAGGCTCTTCACGCCCTGGGTGGTCACGCCGGTCACCCTACGAATTCTAC (SEQ ID NO: 98)

As noted in Table 2 and shown in FIG. 5, fragment D was constructed witha BamHI restriction site placed between the BglII site and the HindIIIsite at the 3′ end of the fragment. Fragment I was constructed to carrythe DNA from PmlI (2491) to BstEII (2661) followed immediately by theDNA from BstEII (2955) to KpnI (3170), so that the insertion of theBstEII fragment from pAMJ into the BstEII site of pAMI in the correctorientation will generate the desired sequences from 2491 to 3170.Plasmid pAM1B was digested with ApaI and HindIII and the insert waspurified by agarose gel electrophoresis and inserted into plasmid pAM1Adigested with ApaI and HindIII, generating plasmid pAM1AB. Plasmid pAM1Dwas digested with PmlI and HindIII and the insert was purified byagarose gel electrophoresis and inserted into plasmid pAM1AB digestedwith PmlI and HindIII, generating plasmid pAM1ABD. Plasmid pAM1C wasdigested with PmlI and the insert was purified by agarose gelelectrophoresis and inserted into plasmid pAM1ABD digested with PmlI,generating plasmid pAM1ABCD, insert orientation was confirmed by theappearance of a diagnostic 111 bp fragment when digested with MscI.Plasmid pAM1F was digested with BglII and HindIII and the insert waspurified by agarose gel electrophoresis and inserted into plasmid pAM1Edigested with BglII and HindIII, generating plasmid pAM1EF. PlasmidpAM1G was digested with KpnI and HindIII and the insert was purified byagarose gel electrophoresis and inserted into plasmid pAM1EF digestedwith KpnI and HindIII, generating plasmid pAM1EFG. Plasmid pAM1J wasdigested with BstEII and the insert was purified by agarose gelelectrophoresis and inserted into plasmid pAM1I digested with BstEII,generating plasmid pAM1IJ; orientation was confirmed by the appearanceof a diagnostic 465 bp fragment when digested with EcoRI and EagI.Plasmid pAM1IJ was digested with PmlI and HindIII and the insert waspurified by agarose gel electrophoresis and inserted into plasmid pAM1Hdigested with PmlI and HindIII, generating plasmid pAM1HIJ. PlasmidpAM1M was digested with EcoRI and BstEII and the insert was purified byagarose gel electrophoresis and inserted into plasmid pAM1N digestedwith EcoRI and BstEII, generating plasmid pAM1MN. Plasmid pAM1L wasdigested with EcoRI and SmaI and the insert was purified by agarose gelelectrophoresis and inserted into plasmid pAM1MN digested with EcoRI andEcoRV, generating plasmid pAM1LMN. Plasmid pAM1LMN was digested withApaI and HindIII and the insert was purified by agarose gelelectrophoresis and inserted into plasmid pAM1K digested with ApaI andHindIII, generating plasmid pAM1KLMN. Plasmid pAM1EFG was digested withBamHI and the insert was purified by agarose gel electrophoresis andinserted into plasmid pAM1ABCD digested with BamHI and BglII, generatingplasmid pAM1ABCDEFG; orientation was confirmed by the appearance of adiagnostic 552 bp fragment when digested with BglII and HindIII. PlasmidpAM1KLMN was digested with KpnI and HindIII and the insert was purifiedby agarose gel electrophoresis and inserted into plasmid pAM1HIJdigested with KpnI and HindIII, generating plasmid pAM1HIJKLMN. PlasmidpAM1HIJKLMN was digested with BamHI and HindIII and the insert waspurified by agarose gel electrophoresis and inserted into plasmidpAM1ABCDEFG digested with BamHI and HindIII, generating plasmid pAM1-1.These cloning steps are depicted in FIG. 6. FIG. 7 shows the DNAsequence of the insert contained in pAM1-1 (SEQ ID NO:1). This insertcan be cloned into any suitable expression vector as an NheI-SmaIfragment to generate an expression construct. pXF8.61 (FIG. 4), pXF8.38(FIG. 11) and pXF8.224 (FIG. 13) are examples of such a construct.

Construction of pXF8.186

The “LE” version of the B-domain-deleted-FVIII optimized cDNA containedin pAM1-1 was modified by replacing the Leu-Glu dipeptide (2284-2289) atthe junction of the heavy and light chains with four Arginine residues,making a total of five consecutive Arginine residues (SEQ ID NO:2). Thiswas achieved as follows. The six oligonucleotides shown in Table 4 wereannealed, ligated, digested with EcoRI and HindIII and cloned into pUC18digested with EcoRI and HindIII, generating the plasmid pAM8B. FIG. 8shows how these oligonucleotides anneal to form the requisite DNAsequence. pAM8B was digested with BamHI and BstXI and the 230 bp insertwas purified by agarose gel electrophoresis and used to replace theBamHI(2126)-BstXI(2352) fragment of the “LE” version (See FIG. 7). FIG.9 shows the sequence of the resulting cDNA (SEQ ID NO:2). This “5Arg”version of the B-domain-deleted-FVIII optimized cDNA can be cloned intoany suitable expression vector as a NheI-SmaI fragment to generate anexpression construct. pXF8.186 (FIG. 3) is an example of such aconstruct.

TABLE 4 OLIGO′ OLIGO′ NAME LENGTH OLIGONUCLEOTIDE SEQUENCE AM8F1 140GTAGAATTCGGATCCTGGGCTGCCACAACAGCGACTTCCGCAACCGCGGCATGACCGCCCTGCTGAAGGTGAGCAGCTGCGACAAGAACACCGGCGACTACTACGAGGAC AGCTACGAGGACATCAGCGCCTACCTGCTG(SEQ ID NO: 99) AM8BF2 57 AGCAAGAACAACGCCATCGAGCCCCGCAGGCGCAGGCGCGAGATCACCCGCACCACC (SEQ ID NO: 100) AM8F4 58CTGCAGAGCGACCAGGAGGAGATCGACTACGACGAC ACCATCAGCGTGGAAGCTTTAC(SEQ ID NO: 101) AM8R1 79 GTAAAGCTTCCACGCTGATGGTGTCGTCGTAGTCGATCTCCTCCTGGTCGCTCTGCAGGGTGGTGCGGGTGATCT CGCG (SEQ ID NO: 102) AM8BR2 57CCTGCGCCTGCGGGGCTCGATGGCGTTGTTCTTGCTCA GCAGGTAGGCGCTGATGTC(SEQ ID NO: 103) AM8BR4 119 CTCGTAGCTGTCCTCGTAGTAGTCGCCGGTGTTCTTGTCGCAGCTGCTCACCTTCAGCAGGGCGGTCATGCCGCGGTTGCGGAAGTCGCTGTTGTGGCAGCCCAGGATCCGA ATTCTAC (SEQ ID NO: 104)

Construction of pXF8.36

The construct for expression of human Factor VIII, pXF8.36 (FIG. 10) isan 11.1 kilobase circular DNA plasmid which contains the followingelements: A cytomegalovirus immediate early I gene (CMV) 5′ flankingregion comprised of a promoter sequence, a 5′ untranslated sequence(5′UTS) and first intron sequence for initiation of transcription of theFactor VIII cDNA. The CMV region is next fused with a wild-type Bdomain-deleted Factor VIII cDNA sequence. The Factor VIII cDNA sequenceis fused, at the 3′ end, with a 0.3 kb fragment of the human growthhormone 3′ untranslated sequence. A transcription termination signal and3′ untranslated sequence (3′ UTS) of the human growth hormone gene isused to ensure processing of the message immediately following the stopcodon. A selectable marker gene (the bacterial neomycinphosphotransferase (neo) gene) is inserted downstream of the Factor VIIIcDNA to allow selection for stably transfected mammalian cells using theneomycin analog G418. Expression of the neo gene is under the control ofthe simian virus 40 (SV40) early promoter. The pUC 19-based ampliconcarrying the pBR322-derived-β-lactamase (amp) and origin of replication(ori) allows for the uptake, selection and propagation of the plasmid inE coli K-12 strains. This region was derived from the plasmid pBSII SK+.

Construction of pXF8.38

The construct for expression of human Factor VIII, pXF8.38 (FIG. 11) isan 11.1 kilobase circular DNA plasmid which contains the followingelements: A cytomegalovirus immediate early I gene (CMV) 5′ flankingregion comprised of a promoter sequence, 5′ untranslated sequence(5′UTS) and first intron sequence for initiation of transcription of theFactor VIII cDNA. The CMV region is next fused with a synthetic,optimally configured B domain-deleted Factor VIII cDNA sequence. TheFactor VIII cDNA sequence is fused, at the 3′ end, with a 0.3 kbfragment of the human growth hormone 3′ untranslated sequence. Atranscription termination signal and 3′ untranslated sequence (3′ UTS)of the human growth hormone gene is used to ensure processing of themessage immediately following the stop codon. A selectable marker gene(the bacterial neomycin phosphotransferase (neo) gene) to allowselection for stably transfected mammalian cells using the neomycinanalog G418 is inserted downstream of the Factor VIII cDNA. Expressionof the neo gene is under the control of the simian virus 40 (SV40) earlypromoter. The pUC 19-based amplicon carrying the pBR322-derivedβ-lactamase (amp) and origin of replication (ori) allows for the uptake,selection and propagation of the plasmid in E coli K-12 strains. Thisregion was derived from the plasmid pBSII SK+.

pXF8.269 Construct

The construct for expression of human Factor VIII (FIG. 12), pXF8.269,is a 14.8 kilobase (kb) circular DNA plasmid which contains thefollowing elements: A human collagen (I) α2 promoter which contains 0.17kb of 5′ untranslated sequence (5′UTS), Aldolase A gene 5′ untranslatedsequence (5′UTS) and first intron sequence for initiation oftranscription of the Factor VIII cDNA. The aldolase intron region isnext fused with a synthetic, wild-type B domain-deleted Factor VIII cDNAsequence. A transcription termination signal and 3′ untranslatedsequence (3′UTS) of the human growth hormone gene to ensure processingof the message immediately following the stop codon. A selectable markergene (the bacterial neomycin phosphotransferase (neo) gene) to allowselection for stably transfected mammalian cells using the neomycinanalog G418 is inserted downstream of the Factor VIII cDNA. Theexpression of the neo gene is under the control of the SV40 promoter.The pUC 19-based amplicon carrying the pBR322-derived β-lactamase (amp)and origin of replication (ori) allows for the uptake, selection andpropagation of the plasmid in E coli K-12 strains. This region wasderived from the plasmid pBSII SK+.

pXF8.224 Construct

The construct for expression of human Factor VIII, pXF8.224 (FIG. 13),is a 14.8 kilobase (kb) circular DNA plasmid which contains thefollowing elements: A human collagen (I) α2 promoter which contains 0.17kb of 5′ untranslated sequence (5′UTS), aldolase A gene 5′ untranslatedsequence (5′UTS) and first intron sequence for initiation oftranscription of the Factor VIII cDNA. The aldolase intron region isnext fused with a synthetic, optimally configured B domain-deletedFactor VIII cDNA sequence. A transcription termination signal and 3′untranslated sequence (3′UTS) of the human growth hormone gene is usedto ensure processing of the message immediately following the stopcodon. A selectable marker gene (the bacterial neomycinphosphotransferase (neo) gene) to allow selection for stably transfectedmammalian cells using the neomycin analog G418 is inserted downstream ofthe Factor VIII cDNA. The expression of the neo gene is under thecontrol of the SV40 promoter. The pUC 19-based amplicon carrying thepBR322-derived-β-lactamase (amp) and origin of replication (ori) allowsfor the uptake, selection and propagation of the plasmid in E coli K-12strains. This region was derived from the plasmid pBSII SK+.

Clotting Assay

A clotting assay based on an activated partial thromboplastin time(aPTT) (Proctor, et al., Am. J. Clin. Path., 36:212-219, (1961)) wasperformed to analyze the biological activity of the BDD hFVIII moleculesexpressed by constructs in which BDD-FVIII coding region was optimized.

Biological Activity as Analyzed Using the Clotting Assay

The results of the aPTT-based clotting assay are presented in Table 5,below. Specific activity of the hFVIII preparations is presented as aPTTunits per milligram hFVIII protein as determined by ELISA. Both of thehuman fibroblast-derived BDD hFVIII molecules (5R and LE) have highspecific activity when measured the aPTT clotting assay. These specificactivities have been determined to be up to 2- to 3-fold higher thanthose determined for CHO cell-derived full-length FVIII (as shown inTable 5). An average of multiple determinations of specific activitiesfor various partially purified preparations of 5R and LE BDD hFVIII alsoshows consistently higher values for the BDD hFVIII molecules (11,622Units/mg for 5R BDD hFVIII, and 14,561 Units/mg for LE BDD hFVIII ascompared to 7097 Units/mg for full-length CHO cell-derived FVIII). Anincreased rate and/or extent of thrombin activation has been observedfor various BDD hFVIII molecules, possibly due to an effect of theB-domain to protect the heavy and light chains from thrombin cleavageand activation (Eaton et al., Biochemistry, 25:8343-8347, (1986),Meulien et al., Protein Engineering, 2:301-306, (1988)).

TABLE 5 Specific Activities of Various hFVIII Proteins aPTT SpecifichFVIII Concentration Activity Activity Product by ELISA (mg/mL) (aPTTU/mL) (aPTT U/mg) 5R BDD 0.050 1306 26,120 hFVIII LEBDD 0.124 290823,452 HFVIII Full-length 0.158 1454 9202 (CHO- derived) FVIII

Assay for Human Factor VIII in Transfected Cell Culture Supernatants

Samples of cell culture, supernatants having cells transfected withwild-type, or optimized human BDD-human Factor VIII were assayed forhuman Factor VIII (hFVIII) content by using an enzyme-linkedimmunosorbent assay (ELISA). This assay is based on the use of twonon-crossreacting monoclonal antibodies (mAb) in conjunction withsamples consisting of cell culture media collected from the supernatantsof transfected human fibroblast cells. Methods of transfection andidentification of positively transfected cells are described in the U.S.Pat. No. 5,641,670, which is incorporated herein by reference.

TABLE 6 Mean Promoter/5′ Factor VIII cDNA (FVIII mU/10⁶ Maximum (FVIIINumber Fold Plasmid Untranslated sequence Composition Cells/24 hr.)mU/10⁶ Cells/24 hr.) of Strains increase pXF8.36 CMV IE1 Wild Type 5672557 38 — pXF8.38 CMV IE1 Optimal Configuration 5403 17106 24 9.5XpXF8.269 Collagen II2/Aldolase Wild Type 382 1227 18 — Intron pXF8.224Collagen II2/Aldolase Optimal Configuration 2022 11930 218 5.3X IntronELISA units based on standard curves prepared from pooled normal plasma.

II. Factor IX Constructs and Uses Thereof Construction of Synthetic GeneEncoding Clotting Factor IX

The four gene fragments listed in Table 7 and shown in FIG. 14 were madeby automated oligonucleotide synthesis and cloned into plasmid pBS togenerate four plasmids, pFIXA through pFIXD.

TABLE 7 Fragment 5′ end 3′ end A BamHI 1 StuI(/FspI) 379 B (StuI/)FspI379 PflMI 810 C PflMI 810 PstI 1115 D PstI 1115 BamHI 1500

As shown in FIG. 14, plasmids pFIXA through pFIXD were used to constructpFIXABCD, which carries the complete synthetic gene. Fragment A wassynthesized with a PstI site 3′ to the StuI site, and was cloned as aBamHI-PstI fragment. Plasmid pFIXD was digested with PstI and HindIII,and the insert was purified by agarose gel electrophoresis and insertedinto plasmid pFIXA digested with PstI and HindIII, generating plasmidpFIXAD. Plasmid pFIXB was digested with EcoRI and PflMI and the insertwas purified by agarose gel electrophoresis and inserted into plasmidpFIXC digested with EcoRI and PflMI, generating plasmid pFIXBC. PlasmidpFIXBC was digested with FspI and PstI and the insert was purified byagarose gel electrophoresis and inserted into plasmid PFIXAD digestedwith StuI and PstI, generating plasmid PFIXABCD.

FIG. 15 shows the DNA sequence of the BamHI insert contained inpFIXABCD. This insert can be cloned into any suitable expression vectoras a BamHI fragment to generate an expression construct. This exampleillustrates how a fusion site can be used in the construction even whenthere exists an identical sequence in close proximity (Fragments A, Band D all contain the hexamer “AGGGCA”, the product of blunt endligation of StuI-FspI digested DNA). This is possible because theresulting fusion sites are not cut by the restriction enzymes used tocreate them. This example also illustrates how the gene fragments can bysynthesized with additional restriction sites outside of the actual genesequence, and these sites can be used to facilitate intermediate cloningsteps.

Expression of Human Factor IX from Optimized and Non-Optimized cDNA

The construct for the expression of human Factor IX (FIG. 16), pXIX76,is a 8.4 kilobase (kb) circular DNA plasmid which contains the followingelements: a cytomegalovirus (CMV) immediate early I gene 5′ flankingregion comprising a promoter sequence, 5′ untranslated sequence (5′UTS)and a first intron sequence. The CMV region is next fused with awild-type Factor IX cDNA sequence, with a BamHI site at the junction.The Factor IX cDNA sequence is next fused to a 1.5 kb fragment from the3′ region of the Factor IX gene that includes the transcriptiontermination signal. A selectable marker gene (the bacterial neomycinphosphotransferase gene (neo)) to allow selection for stably transfectedmammalian cells using the neomycin analog G418 is inserted upstream ofthe CMV sequences. Expression of the neo gene is under the control ofthe herpes simplex virus thymidine kinase promoter. The pUC19-basedamplicon carrying the pBR322-derived beta-lactamase gene and origin ofreplication allows for the selection and propagation of the plasmid inE. coli.

Plasmid pXIX170 containing a Factor IX coding region with an optimizedconfiguration can be derived from pXIX76 by digestion with BamHI and MIand insertion of the BamHI fragment shown in FIG. 15, thus producing anequivalent construct that directs the expression of human Factor IX froman optimized cDNA.

Samples of cell culture supernatants from normal human foreskinfibroblast clones transfected with either wild-type or optimizedexpression constructs were assayed for expression of Factor IX. As seenin Table 8, a 2.7-fold increase in mean expression of Factor IX could bedemonstrated when optimized cDNA was substituted for the wild-typesequence.

TABLE 8 Expression data for strains expressing Factor IX Promoter/5′Mean Maximum Number untranslated cDNA Nanograms/10⁶ of Cell Plasmidsequence composition cells/24 hr Strains pXIX76 CMV Wild Type 418 8384144 pXIX170 CMV Optimal 1127 3316 33 Configuration

III. Alpha-Galactosidase Constructs and Uses Thereof Construction of aSynthetic Gene Encoding I-Galactosidase

The four gene fragments listed in Table 9 were made by automatedoligonucleotide synthesis and cloned into the vector pUC18 as EcoRI-HindIII fragments (with the N-terminus of each gene fragment adjacent to theEcoRI site) to generate four plasmids, pAM2A through pAM2D.

TABLE 9 Fragment 5′ end A BamHI 1 PstI 364 B PstI 364 Bg1II(/BamHI) 697C (Bg1II/)BamHI 697 SmaI(/StuI) 1012 D (SmaI/)StuI 1012 XhoI 1347

Plasmids pAM2A through pAM2D were used to construct pAM2ABCD, whichcarries the complete synthetic gene. Plasmid pAM2B was digested withPstI and HindIII and the insert was purified by agarose gelelectrophoresis and inserted into plasmid pAM2A digested with PstI andHindIII, generating plasmid pAM2AB. Plasmid pAM2D was digested with StuIand HindIII and the insert was purified by agarose gel electrophoresisand inserted into plasmid pAM2C digested with SmaI and HindIII,generating plasmid pAM2CD. Plasmid pAM2CD was digested with BamHI andHindIII and the insert was purified by agarose gel electrophoresis andinserted into plasmid pAM2AB digested with BglII and HindIII, generatingplasmid pAM2ABCD.

FIG. 17 shows the DNA sequence of the BamHI-XhoI fragment contained inpAM2ABCD. This insert can be cloned into any suitable expression vectoras a BamHI-XhoI fragment to generate an expression construct. Thisexample illustrates the use of fusion sites that arise from the ligationof two complementary overhangs (Bg1II/BamHI) and from the ligation ofblunt ends (SmaI/StuI).

Expression of Human I-Galactosidase from Optimized and Non-OptimizedcDNAs

The construct for the expression of human I-galactosidase, plasmidpXAG94 (FIG. 18) is a 8.5 kb circular DNA plasmid which contains thefollowing elements. A selectable marker gene (the bacterial neomycinphosphotransferase gene (neo)) is inserted upstream of theI-galactosidase expression cassette to allow selection for stablytransfected mammalian cells using the neomycin analog G418. Expressionof the neo gene is under the control of the SV40 early promoter.Poly-adenylation signals for this expression cassette are supplied bysequences 3393-3634 of SYNPRSVNEO. This selectable marker is fused to ashort plasmid sequence, equivalent to nucleotides 2067 (PvuII)-2122 ofSYNPBR322.

Expression of the I-galactosidase cDNA is directed from a CMV enhancer.This DNA is fused via the linker sequenceTCGACAAGCCGAATTCCAGCACACTGGCGGCCGTTACTAGTGGATCCGAG (SEQ ID NO:107) tohuman elongation factor 1I sequences extending from −207 to +982nucleotides relative to the cap site. These sequences provide the EF1alpha promoter, CAP site and a 943 nucleotide intron present in the 5′untranslated sequences of this gene. The DNA is next fused to the linkersequence GAATTCTCTAGATCGAATTCCTGCAGCCCGGGGGATCCACC (SEQ ID NO:108)followed immediately by 335 nucleotides of the human growth hormonegene, starting with the ATG initiator codon. This DNA codes for thesignal peptide of the hGH gene, including the first intron.

This DNA is next fused to the portion of the wild-type I-galactosidasecDNA that codes for amino acids 31 to 429. The coding region is nextfused via the linker AAAAAAAAAAAACTCGAGCTCTAG (SEQ ID NO:109) to the 3′untranslated region of the hGH gene. Finally, this DNA is fused to apUC-based amplicon carrying the pBR322-derived beta-lactamase gene andorigin of replication which allows for the selection and propagation ofthe plasmid in E. coli; the sequences are equivalent to nucleotides229-1/2680-281 of SYNPUC12V.

Plasmid pXAG95 is equivalent to pXAG94, with the I-galactosidase cDNAsequence replaced with the corresponding optimized configurationsequence (coding for amino acids 31 to 429) from FIG. 17.

Plasmid pXAG73 (FIG. 19) is a 10 kb plasmid similar to pXAG94, but withthe following differences. The linker sequenceGCCGAATTCCAGCACACTGGCGGCCGTTACTAGTGGATCCGAG (SEQ ID NO: 110) and theadjacent EF1 alpha DNA as far as +30 beyond the cap site have beenreplaced with the mouse metallothionein promoter and cap site(nucleotides −1752 to +54 relative to the mMTI cap site). Also theattachment of the EF1I UTS to the hGH coding sequence differs: EF1Isequences extend as far as +973 from the EF1I cap site, followed by thelinker CTAGGATCCACC (SEQ ID NO:111), in place of theGAATTCTCTAGATCGAATTCCTGCAGCCCGGGGGATCCACC (SEQ ID NO:108) linkerdescribed above.

Plasmid pXAG74 is equivalent to pXAG73, with the wild-typeI-galactosidase cDNA sequence replaced with the corresponding optimizedconfiguration sequence (coding for amino acids 31 to 429) from FIG. 17.

The construction of such plasmids, including the creation ofhGH-I-galactosidase fusions, is described in the U.S. Pat. No.6,083,725, which is incorporated herein by reference.

Samples of cell culture supernatants from normal human foreskinfibroblast clones transfected with either wild-type or optimizedexpression constructs were assayed for expression of I-galactosidase.

TABLE 10 Expression data for strains expressing alpha-galactosidasePromoter/5′ Mean Maximum Number untranslated cDNA Units/10⁶ of CellPlasmid sequence composition cells/24 hr Strains pXAG-73 CMV/mMT/ WildType 323 752 12 EF1a pXAG-74 CMV/mMT/ Optimal 1845 8586 27 EF1aConfiguration pXAG-94 CMV/EF1a Wild Type 417 1758 39 pXAG-95 CMV/EF1aOptimal 842 3751 75 Configuration

As shown in Table 10, 5.7- and 2.0-fold increases in meanI-galactosidase expression were seen when optimized cDNA was expressedfrom the EF1a (PXAG-95) and mMT1 (PXAG-74) promoters, respectively, whencompared to wild type coding sequences. Furthermore, significantincreases in maximum expression were also seen when the optimized cDNAwas expressed from either promoter.

All patents and other references cited herein are hereby incorporated byreference.

EQUIVALENTS

Those skilled in the art will recognize, or be able to ascertain usingno more than routine experimentation, many equivalents to the specificembodiments of the invention described herein. Such equivalents areintended to be encompassed by the following claims.

1. A synthetic nucleic acid sequence which encodes a protein wherein atleast one non-common codon or less-common codon has been replaced by acommon codon, and wherein the synthetic nucleic acid sequence comprisesa continuous stretch of at least 90 codons all of which are commoncodons, or wherein the synthetic nucleic acid sequence comprises acontinuous stretch of common codons, which continuous stretch includesat least 33% or more of the codons in the synthetic nucleic acidsequence, or wherein at least 94% or more of the codons in the sequenceencoding the protein are common codons and the synthetic nucleic acidsequence encodes a protein of at least about 90 amino acids in length,and wherein the protein is selected from the group consisting of: bloodclotting factor V, blood clotting factor VII, blood clotting factor X,blood clotting factor XIII; an interleukin; erythropoietin (EPO);calcitonin; growth hormone; insulin; insulinotropin; an insulin-likegrowth factor; parathyroid hormone; Θ-interferon; K-interferon; a nervegrowth factor; FSHΘ; tumor necrosis factor; glucagon; bone growthfactor-2; bone growth factor-7 TSH-Θ; CSF-granulocyte; CSF-macrophage;CSF-granulocyte/macrophage; an immunoglobulin; a catalytic antibody;protein kinase C; glucocerebrosidase; superoxide dismutase; tissueplasminogen activator; urokinase; antithrombin III; DNAse; tyrosinehydroxylase; apolipoprotein E; apolipoprotein A-I; a globin; low densitylipoprotein receptor; IL-2 receptor; an IL-2 antagonist; alpha-1antitrypsin; soluble CD4; a protein encoded by a virus; an antigen; aprotein which does not occur in nature; glucogen-like peptide-1 (GLP-1);β-glucoceramidase; α-iduronidase; α-L-iduronidase;glucosamine-N-sulfatase; alpha-N-acetylglucosaminidase; acetylcoenzymeA:α-glucosmamide-N-acetyltransferase; N-acetylglucosamine-6-sulfatase;β-galactosidase; N-acetylgalactosamine-6-sulfatase; and β-glucuronidase.2. The nucleic acid of claim 1, wherein the number of non-common orless-common codons replaced or remaining is less than
 15. 3. The nucleicacid of claim 1, wherein all of the non-common and less-common codons ofthe synthetic nucleic acid sequence encoding a protein have beenreplaced with common codons.
 4. A vector comprising the syntheticnucleic acid sequence of claim
 1. 5. A cell comprising the nucleic acidsequence of claim 1.